Latent Space: The AI Engineer Podcast cover image

ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast

00:00

Benchmarking Complexity in AI

This chapter explores the intricacies of ordinal voting systems in the context of machine learning benchmarks and their vulnerability to irrelevant alternatives. It emphasizes the critical trade-offs between sensitivity and diversity when evaluating models, while introducing innovative concepts like DynaBench for dynamic benchmarking. The discussion invites collaboration to enhance ongoing research, underscoring the evolving nature of benchmarks in AI development.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app