Dwarkesh Podcast cover image

Shane Legg (DeepMind Founder) - 2028 AGI, New Architectures, Aligning Superhuman Models

Dwarkesh Podcast

00:00

Evaluating Language Models: Beyond Benchmarks

This chapter explores the limitations of current benchmarks in measuring the performance of large language models, particularly regarding human cognitive aspects like episodic memory. The speakers discuss the need for more comprehensive evaluations to better reflect the complexities of human intelligence and the potential for future advancements in deep learning.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app