
41 - Lee Sharkey on Attribution-based Parameter Decomposition
AXRP - the AI X-risk Research Podcast
00:00
Understanding Feature Geometry in Neural Networks
This chapter explores the limitations of Sparse Autoencoders and their reliance on linear hypotheses, emphasizing the need for a deeper understanding of feature geometry in neural networks. It discusses the significance of shared latent variables and the implications for attributing computational structures in AI. The chapter ultimately advocates for comprehensive models that enhance interpretability and assessment of neural network behaviors over time.
Transcript
Play full episode