Latent Space: The AI Engineer Podcast cover image

ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast

00:00

Understanding Cardinal vs. Ordinal Benchmarks in Language Model Evaluation

This chapter explores the differences between cardinal and ordinal benchmarks in language model evaluation, focusing on the HELM benchmark from Stanford. It contrasts this with the OpenLLM leaderboard, while appreciating the community's contributions to benchmarking advancements.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app