
Moving the AGI Goal Posts: AI Skepticism with Sayash Kapoor
Scaling Laws
00:00
Challenging AI Evaluation: Benchmarks and Real-World Skills
This chapter critiques existing evaluation benchmarks for AI models, such as SWEBench and GPQA, arguing they fail to capture real-world applicability. The discussion uses extreme scenarios to illustrate the gap between benchmark performance and the actual skills needed in professional settings.
Transcript
Play full episode