LessWrong (Curated & Popular)

“Mech interp is not pre-paradigmatic” by Lee Sharkey

Jun 17, 2025
In this discussion, Lee Sharkey, a specialist in mechanistic interpretability, challenges the notion that Mech Interp is pre-paradigmatic. He explores the evolution of mechanistic interpretation through distinct waves, addressing the crises within both first and second waves. Sharkey emphasizes the importance of paradigm shifts in scientific understanding and introduces the concept of parameter decomposition in neural networks. He advocates for a potential third wave that could resolve ongoing challenges, inviting collaboration in this emerging field.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Mechinterp Lies Within CNC Paradigm

  • Mechanistic interpretability (Mechinterp) is not pre-paradigmatic but lies within the established CNC paradigm.
  • It inherits concepts, methods, and standards from computational neuroscience and connectionism.
INSIGHT

First Wave and Polysemantic Neurons Crisis

  • The first wave of Mechinterp (2012-2021) focused on demonstrating interpretable structure exists in neural networks.
  • The discovery of polysemantic neurons posed a major anomaly leading to a crisis in this wave.
INSIGHT

Second Wave & Sparse Dictionary Learning Challenges

  • Second-wave Mechinterp (2022-present) addressed polysemanticity using the superposition hypothesis and Sparse Dictionary Learning (SDL).
  • Despite progress, SDL introduced new anomalies such as feature splitting and missing features.
Get the Snipd Podcast app to discover more snips from this episode
Get the app