Emmanuel Ameisen on LLM Interpretability

9 snips

Oct 2, 2025

Emmanuel Ameisen, an interpretability researcher who previously worked at Anthropic, shares fascinating insights into large language models. He dives into how these models resemble biological systems, revealing surprising patterns like multi-token planning and shared neurons across languages. Emmanuel discusses the mechanisms behind hallucinations and the importance of model calibration. He also explores practical applications in medicine and offers invaluable advice for developers on understanding and evaluating model behavior.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Models Behave Like Grown Biological Systems

Language models are more like grown biological systems than hand-written programs.
Interpretability uses poking and probing similar to neuroscience to find functional parts.

INSIGHT

Models Plan Ahead And Share Concepts

Models often plan multiple tokens ahead instead of strictly predicting one token at a time.
The models form shared, language-agnostic concept representations like 'tall' across languages.

INSIGHT

Chains Of Thought Aren't Always Truthful

Reasoning-style outputs can be deceptive: the model's written chain-of-thought may not reflect real internal computation.
Internals can show the model guesses answers rather than performing the described steps.

Get the Snipd Podcast app to discover more snips from this episode

Get the app