Machine Learning Street Talk (MLST)

Neel Nanda - Mechanistic Interpretability

152 snips
Jun 18, 2023
Neel Nanda, a researcher at DeepMind specializing in mechanistic interpretability, dives into the intricate world of AI models. He discusses how models can represent thoughts through motifs and circuits, revealing the complexities of superposition where models encode more features than neurons. Nanda explores the fascinating idea of whether models can possess goals and highlights the role of 'induction heads' in tracking long-range dependencies. His insights into the balance between elegant theories and the messy realities of AI add depth to the conversation.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

T-Shirt Story

  • Neel Nanda explains his Grokking t-shirt's origin in meme culture.
  • It depicts a Shoggoth with a smiley face, symbolizing language models' hidden complexity.
ADVICE

MechInterp Starting Point

  • Getting started with mechanistic interpretability is easier than it seems.
  • Start with simple models like GPT-2 in Colab notebooks for fast feedback.
INSIGHT

Alien Neuroscience of Models

  • Models have their own "alien" ways of representing concepts, unlike human intuitions.
  • Nanda's modular addition work shows models use rotations, not typical addition algorithms.
Get the Snipd Podcast app to discover more snips from this episode
Get the app