The 80000 Hours Podcast on Artificial Intelligence cover image

Four: Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters

The 80000 Hours Podcast on Artificial Intelligence

00:00

Mechanistic Interpretability at DeepMind

The chapter highlights the importance of mechanistic interpretability in understanding how AI systems produce their outputs. It discusses the challenges of interpretability in large language models and the potential benefits of intermediate progress in identifying failure modes. DeepMind's work in this area is also mentioned.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app