“Mech interp is not pre-paradigmatic” by Lee Sharkey

Jun 17, 2025

In this discussion, Lee Sharkey, a specialist in mechanistic interpretability, challenges the notion that Mech Interp is pre-paradigmatic. He explores the evolution of mechanistic interpretation through distinct waves, addressing the crises within both first and second waves. Sharkey emphasizes the importance of paradigm shifts in scientific understanding and introduces the concept of parameter decomposition in neural networks. He advocates for a potential third wave that could resolve ongoing challenges, inviting collaboration in this emerging field.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Mechinterp Lies Within CNC Paradigm

Mechanistic interpretability (Mechinterp) is not pre-paradigmatic but lies within the established CNC paradigm.
It inherits concepts, methods, and standards from computational neuroscience and connectionism.

INSIGHT

First Wave and Polysemantic Neurons Crisis

The first wave of Mechinterp (2012-2021) focused on demonstrating interpretable structure exists in neural networks.
The discovery of polysemantic neurons posed a major anomaly leading to a crisis in this wave.

INSIGHT

Second Wave & Sparse Dictionary Learning Challenges

Second-wave Mechinterp (2022-present) addressed polysemanticity using the superposition hypothesis and Sparse Dictionary Learning (SDL).
Despite progress, SDL introduced new anomalies such as feature splitting and missing features.

Get the Snipd Podcast app to discover more snips from this episode

Get the app