
41 - Lee Sharkey on Attribution-based Parameter Decomposition
AXRP - the AI X-risk Research Podcast
00:00
Intro
This chapter discusses a research paper on interpretability in parameter space, focusing on minimizing mechanistic description length through parameter decomposition. The authors explore challenges with sparse autoencoders and propose a novel approach that emphasizes dissecting network parameters to improve understanding of neural networks.
Transcript
Play full episode