Machine Learning Street Talk (MLST) cover image

Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders)

Machine Learning Street Talk (MLST)

CHAPTER

Navigating Mechanistic Interpretability

This chapter guides individuals in mechanistic interpretability by discussing essential theoretical and practical approaches, including reading papers, engaging in coding tutorials, and leveraging online communities. It highlights key techniques like activation patching and attribution patching, illustrating their importance in understanding model behavior and the challenges of attributing model performance to specific components.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner