Generative Now | AI Builders on Creating the Future

Inside the Black Box: The Urgency of AI Interpretability

Oct 2, 2025

Guest

Tom McGrath

Guest

Jack Lindsay

Jack Lindsay, a researcher at Anthropic with a background in theoretical neuroscience, teams up with Tom McGrath, co-founder and Chief Scientist at Goodfire and a former member of DeepMind's interpretability team. They tackle the critical topic of AI interpretability, discussing the urgency of understanding modern AI models for safety and reliability. They explore technical challenges, real-world applications, and how larger models complicate analysis. Insights into neuroscience inform their work, making the case for interpretability as essential for trusted AI.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

00:00 / 00:00

Models Outpace Our Understanding

Models are outpacing our understanding, creating unacceptable risk as they're used in high-stakes tasks.
We need ways to trust model reasoning even when humans can't verify every output.

00:00 / 00:00

Multiple Ways To Explain Why

Mechanistic interpretability asks "why" by describing structures and causal mechanisms in models' computations.
Broader interpretability also includes data and utility explanations to fully explain model behavior.

00:00 / 00:00

Interpretability Is Reverse-Engineering Biology

Neural networks are not human-written programs; their behavior emerges from training, creating a reverse-engineering problem.
Interpretability resembles biology: we must discover hierarchical abstractions to explain complex, distributed systems.

Get the Snipd Podcast app to discover more snips from this episode

Get the app