
Shane Legg (DeepMind Founder) - 2028 AGI, New Architectures, Aligning Superhuman Models
Dwarkesh Podcast
00:00
Evaluating Language Models: Beyond Benchmarks
This chapter explores the limitations of current benchmarks in measuring the performance of large language models, particularly regarding human cognitive aspects like episodic memory. The speakers discuss the need for more comprehensive evaluations to better reflect the complexities of human intelligence and the potential for future advancements in deep learning.
Transcript
Play full episode