41 - Lee Sharkey on Attribution-based Parameter Decomposition

Jun 3, 2025

Lee Sharkey, an interpretability researcher at Goodfire and co-founder of Apollo Research, shares his insights into Attribution-based Parameter Decomposition (APD). He explains how APD can simplify neural networks while maintaining fidelity, discusses the trade-offs of model complexity and performance, and delves into hyperparameter selection. Sharkey also draws analogies between neural network components and car parts, highlighting the importance of understanding feature geometry. The conversation navigates the future applications and potential of APD in optimizing neural network efficiency.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Core of APD Explained

APD decomposes a neural network's parameters into component mechanisms that sum to the original parameters.
It optimizes for faithfulness, minimality of active components per forward pass, and simplicity of each component.

INSIGHT

Top K Method for Minimality

APD uses a top K active parameter components approach per forward pass to optimize minimality.
It attributes importance to components via gradients and updates them to increase their causal role in outputs.

INSIGHT

Minimality Encourages Sharing

Minimizing active parameter components per input encourages shared mechanisms across inputs.
APD pushes mechanisms used across inputs to merge, avoiding disjoint decompositions.

Get the Snipd Podcast app to discover more snips from this episode

Get the app