
Shane Legg (DeepMind Founder) - 2028 AGI, New Architectures, Aligning Superhuman Models
Dwarkesh Podcast
Evaluating Language Models: Beyond Benchmarks
This chapter explores the limitations of current benchmarks in measuring the performance of large language models, particularly regarding human cognitive aspects like episodic memory. The speakers discuss the need for more comprehensive evaluations to better reflect the complexities of human intelligence and the potential for future advancements in deep learning.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.