"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

Untangling Neural Network Mechanisms: Goodfire's Lee Sharkey on Parameter-based Interpretability

55 snips
Aug 27, 2025
Lee Sharkey, Principal Investigator at Goodfire, focuses on mechanistic interpretability in AI. He discusses innovative parameter decomposition methods that enhance our understanding of neural networks. Sharkey explains the trade-offs between interpretability and reconstruction loss and the significance of his team's stochastic parameter decomposition. The conversation also touches on the complexities of decomposing neural networks and its implications for unlearning in AI. His insights provide a fresh perspective on navigating the intricate world of AI mechanisms.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Model Unmerging Visualized

  • Lee describes attribution-based parameter decomposition as a kind of model unmerging of a merged mixture-of-experts.
  • He visualizes expanding the network into vertical slices that sum back to the original weights.
INSIGHT

Limitations Of Gradient Attributions

  • Attribution-based decomposition suffered from high memory cost, hyperparameter sensitivity, and gradient-attribution failures.
  • Components at local optima could show near-zero gradients and be misattributed as unimportant.
INSIGHT

Rank-One Subcomponents Plus Stochastic Masking

  • Stochastic Parameter Decomposition (SPD) instead breaks weight matrices into many rank-one subcomponents.
  • SPD uses learned stochastic masking to estimate each subcomponent's causal importance more robustly.
Get the Snipd Podcast app to discover more snips from this episode
Get the app