Machine Learning Street Talk (MLST) cover image

Machine Learning Street Talk (MLST)

Neel Nanda - Mechanistic Interpretability

Jun 18, 2023
04:10:00

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Models can represent their thoughts using motifs, circuits, and linear directional features communicated via a 'residual stream' as an information highway.
  • Superposition, the ability for models to represent more features than neurons, is a key challenge in interpretability.

Deep dives

The Fourier multiplication algorithm: Understanding the mechanism behind groking

The paper explores the algorithm behind groking in a one-layer transformer trained to perform modular addition. They discover the Fourier multiplication algorithm, which uses trigonometric functions and composition to perform modular addition. The model gradually transitions from memorization to generalization, with the help of regularization techniques like weight decay. The paper also highlights the importance of mechanistic understanding in disentangling memorization and generalization. The progress measures they develop shed light on the different phases of training: memorization, space circuit formation, and groking, confirming that groking is not sudden generalization but rather gradual generalization followed by sudden cleanup in test loss.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode