Dwarkesh Podcast cover image

Shane Legg (DeepMind Founder) - 2028 AGI, New Architectures, Aligning Superhuman Models

Dwarkesh Podcast

CHAPTER

Evaluating Language Models: Beyond Benchmarks

This chapter explores the limitations of current benchmarks in measuring the performance of large language models, particularly regarding human cognitive aspects like episodic memory. The speakers discuss the need for more comprehensive evaluations to better reflect the complexities of human intelligence and the potential for future advancements in deep learning.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner