Machine Learning Street Talk (MLST) cover image

Neel Nanda - Mechanistic Interpretability

Machine Learning Street Talk (MLST)

00:00

Exploring Superposition in Neural Networks

This chapter investigates the concept of superposition in neural networks, explaining how models can encode more features than available neurons. It examines the challenges of extracting meaningful features and the complexities of feature interaction, particularly in high-dimensional spaces. Additionally, the discussion includes critical insights into empirical research on superposition's implications for mechanistic interpretability in models like GPT-2.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app