The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727

137 snips
Apr 14, 2025
Emmanuel Ameisen, a research engineer at Anthropic specializing in interpretability research, shares insights from his recent studies on large language models. He discusses how mechanistic interpretability methods shed light on internal processes, showing how models plan creative tasks like poetry and calculate math using unique algorithms. The conversation dives into neural pathways, revealing how hallucinations stem from separate recognition circuits. Emmanuel highlights the challenges of accurately interpreting AI behavior and the importance of understanding these systems for safety and reliability.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Mechanistic Interpretability

  • Mechanistic interpretability allows researchers to analyze the specific mechanisms within LLMs.
  • This shifts the focus from debating general behavior to understanding the underlying processes.
ANECDOTE

Poetry Planning

  • When Claude writes rhyming poetry, it plans the rhyming word in advance.
  • This challenges the notion of LLMs as simple next-word predictors.
INSIGHT

Dictionary Learning

  • Dictionary learning decomposes complex numerical representations within LLMs into simpler concepts.
  • This helps in understanding what the model is "thinking" about when processing information.
Get the Snipd Podcast app to discover more snips from this episode
Get the app