Scaling Laws cover image

Moving the AGI Goal Posts: AI Skepticism with Sayash Kapoor

Scaling Laws

00:00

Challenging AI Evaluation: Benchmarks and Real-World Skills

This chapter critiques existing evaluation benchmarks for AI models, such as SWEBench and GPQA, arguing they fail to capture real-world applicability. The discussion uses extreme scenarios to illustrate the gap between benchmark performance and the actual skills needed in professional settings.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app