Emmanuel Amiesen is lead author of “Circuit Tracing: Revealing Computational Graphs in Language Models” (https://transformer-circuits.pub/2025/attribution-graphs/methods.html ), which is part of a duo of MechInterp papers that Anthropic published in March (alongside https://transformer-circuits.pub/2025/attribution-graphs/biology.html ).
We recorded the initial conversation a month ago, but then held off publishing until the open source tooling for the graph generation discussed in this work was released last week: https://www.anthropic.com/research/open-source-circuit-tracing
This is a 2 part episode - an intro covering the open source release, then a deeper dive into the paper — with guest host Vibhu Sapra (https://x.com/vibhuuuus ) and Mochi the MechInterp Pomsky (https://x.com/mochipomsky ). Thanks to Vibhu for making this episode happen!
While the original blogpost contained some fantastic guided visualizations (which we discuss at the end of this pod!), with the notebook and Neuronpedia visualization (https://www.neuronpedia.org/gemma-2-2b/graph ) released this week, you can now explore on your own with Neuronpedia, as we show you in the video version of this pod.
Chapters
- 00:00 Intro & Guest Introductions
- 01:00 Anthropic's Circuit Tracing Release
- 06:11 Exploring Circuit Tracing Tools & Demos
- 13:01 Model Behaviors and User Experiments
- 17:02 Behind the Research: Team and Community
- 24:19 Main Episode Start: Mech Interp Backgrounds
- 25:56 Getting Into Mech Interp Research
- 31:52 History and Foundations of Mech Interp
- 37:05 Core Concepts: Superposition & Features
- 39:54 Applications & Interventions in Models
- 45:59 Challenges & Open Questions in Interpretability
- 57:15 Understanding Model Mechanisms: Circuits & Reasoning
- 01:04:24 Model Planning, Reasoning, and Attribution Graphs
- 01:30:52 Faithfulness, Deception, and Parallel Circuits
- 01:40:16 Publishing Risks, Open Research, and Visualization
- 01:49:33 Barriers, Vision, and Call to Action