Machine Learning Street Talk (MLST) cover image

Machine Learning Street Talk (MLST)

Neel Nanda - Mechanistic Interpretability

Jun 18, 2023
Neel Nanda, a researcher at DeepMind specializing in mechanistic interpretability, dives into the intricate world of AI models. He discusses how models can represent thoughts through motifs and circuits, revealing the complexities of superposition where models encode more features than neurons. Nanda explores the fascinating idea of whether models can possess goals and highlights the role of 'induction heads' in tracking long-range dependencies. His insights into the balance between elegant theories and the messy realities of AI add depth to the conversation.
04:10:00

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Models can represent their thoughts using motifs, circuits, and linear directional features communicated via a 'residual stream' as an information highway.
  • Superposition, the ability for models to represent more features than neurons, is a key challenge in interpretability.

Deep dives

The Fourier multiplication algorithm: Understanding the mechanism behind groking

The paper explores the algorithm behind groking in a one-layer transformer trained to perform modular addition. They discover the Fourier multiplication algorithm, which uses trigonometric functions and composition to perform modular addition. The model gradually transitions from memorization to generalization, with the help of regularization techniques like weight decay. The paper also highlights the importance of mechanistic understanding in disentangling memorization and generalization. The progress measures they develop shed light on the different phases of training: memorization, space circuit formation, and groking, confirming that groking is not sudden generalization but rather gradual generalization followed by sudden cleanup in test loss.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner