AXRP - the AI X-risk Research Podcast

41 - Lee Sharkey on Attribution-based Parameter Decomposition

Jun 3, 2025
Lee Sharkey, an interpretability researcher at Goodfire and co-founder of Apollo Research, shares his insights into Attribution-based Parameter Decomposition (APD). He explains how APD can simplify neural networks while maintaining fidelity, discusses the trade-offs of model complexity and performance, and delves into hyperparameter selection. Sharkey also draws analogies between neural network components and car parts, highlighting the importance of understanding feature geometry. The conversation navigates the future applications and potential of APD in optimizing neural network efficiency.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Core of APD Explained

  • APD decomposes a neural network's parameters into component mechanisms that sum to the original parameters.
  • It optimizes for faithfulness, minimality of active components per forward pass, and simplicity of each component.
INSIGHT

Top K Method for Minimality

  • APD uses a top K active parameter components approach per forward pass to optimize minimality.
  • It attributes importance to components via gradients and updates them to increase their causal role in outputs.
INSIGHT

Minimality Encourages Sharing

  • Minimizing active parameter components per input encourages shared mechanisms across inputs.
  • APD pushes mechanisms used across inputs to merge, avoiding disjoint decompositions.
Get the Snipd Podcast app to discover more snips from this episode
Get the app