The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727

137 snips

Apr 14, 2025

Emmanuel Ameisen, a research engineer at Anthropic specializing in interpretability research, shares insights from his recent studies on large language models. He discusses how mechanistic interpretability methods shed light on internal processes, showing how models plan creative tasks like poetry and calculate math using unique algorithms. The conversation dives into neural pathways, revealing how hallucinations stem from separate recognition circuits. Emmanuel highlights the challenges of accurately interpreting AI behavior and the importance of understanding these systems for safety and reliability.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Mechanistic Interpretability

Mechanistic interpretability allows researchers to analyze the specific mechanisms within LLMs.
This shifts the focus from debating general behavior to understanding the underlying processes.

ANECDOTE

Poetry Planning

When Claude writes rhyming poetry, it plans the rhyming word in advance.
This challenges the notion of LLMs as simple next-word predictors.

INSIGHT

Dictionary Learning

Dictionary learning decomposes complex numerical representations within LLMs into simpler concepts.
This helps in understanding what the model is "thinking" about when processing information.

Get the Snipd Podcast app to discover more snips from this episode

Get the app