

Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727
137 snips Apr 14, 2025
Emmanuel Ameisen, a research engineer at Anthropic specializing in interpretability research, shares insights from his recent studies on large language models. He discusses how mechanistic interpretability methods shed light on internal processes, showing how models plan creative tasks like poetry and calculate math using unique algorithms. The conversation dives into neural pathways, revealing how hallucinations stem from separate recognition circuits. Emmanuel highlights the challenges of accurately interpreting AI behavior and the importance of understanding these systems for safety and reliability.
AI Snips
Chapters
Transcript
Episode notes
Mechanistic Interpretability
- Mechanistic interpretability allows researchers to analyze the specific mechanisms within LLMs.
- This shifts the focus from debating general behavior to understanding the underlying processes.
Poetry Planning
- When Claude writes rhyming poetry, it plans the rhyming word in advance.
- This challenges the notion of LLMs as simple next-word predictors.
Dictionary Learning
- Dictionary learning decomposes complex numerical representations within LLMs into simpler concepts.
- This helps in understanding what the model is "thinking" about when processing information.