Generative Now | AI Builders on Creating the Future

Inside the Black Box: The Urgency of AI Interpretability

Oct 2, 2025
Jack Lindsay, a researcher at Anthropic with a background in theoretical neuroscience, teams up with Tom McGrath, co-founder and Chief Scientist at Goodfire and a former member of DeepMind's interpretability team. They tackle the critical topic of AI interpretability, discussing the urgency of understanding modern AI models for safety and reliability. They explore technical challenges, real-world applications, and how larger models complicate analysis. Insights into neuroscience inform their work, making the case for interpretability as essential for trusted AI.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
00:00 / 00:00

Models Outpace Our Understanding

  • Models are outpacing our understanding, creating unacceptable risk as they're used in high-stakes tasks.
  • We need ways to trust model reasoning even when humans can't verify every output.
00:00 / 00:00

Multiple Ways To Explain Why

  • Mechanistic interpretability asks "why" by describing structures and causal mechanisms in models' computations.
  • Broader interpretability also includes data and utility explanations to fully explain model behavior.
00:00 / 00:00

Interpretability Is Reverse-Engineering Biology

  • Neural networks are not human-written programs; their behavior emerges from training, creating a reverse-engineering problem.
  • Interpretability resembles biology: we must discover hierarchical abstractions to explain complex, distributed systems.
Get the Snipd Podcast app to discover more snips from this episode
Get the app