Latent Space: The AI Engineer Podcast cover image

ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast

CHAPTER

Benchmarking Complexity in AI

This chapter explores the intricacies of ordinal voting systems in the context of machine learning benchmarks and their vulnerability to irrelevant alternatives. It emphasizes the critical trade-offs between sensitivity and diversity when evaluating models, while introducing innovative concepts like DynaBench for dynamic benchmarking. The discussion invites collaboration to enhance ongoing research, underscoring the evolving nature of benchmarks in AI development.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner