AXRP - the AI X-risk Research Podcast cover image

41 - Lee Sharkey on Attribution-based Parameter Decomposition

AXRP - the AI X-risk Research Podcast

00:00

Understanding Feature Geometry in Neural Networks

This chapter explores the limitations of Sparse Autoencoders and their reliance on linear hypotheses, emphasizing the need for a deeper understanding of feature geometry in neural networks. It discusses the significance of shared latent variables and the implications for attributing computational structures in AI. The chapter ultimately advocates for comprehensive models that enhance interpretability and assessment of neural network behaviors over time.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app