Latent Space: The AI Engineer Podcast cover image

ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast

00:00

Evaluating Language Model Contamination

This chapter examines the challenges and advancements in evaluating language models, focusing on benchmark contamination and detection techniques. It discusses the implications of training data contamination on model performance and presents novel statistical methods for identifying such issues.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app