MIT Technology Review Narrated

Google DeepMind has a new way to look inside an AI’s “mind”

67 snips
Sep 10, 2025
Discover how autoencoders are allowing researchers to peek into the intricate workings of artificial intelligence. The podcast explores Google DeepMind's innovative Gemascope, designed to improve our understanding of AI decisions. This tool may also help mitigate bias and errors, paving the way for more reliable AI systems. Tune in to learn about the exciting potential of mechanistic interpretability!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Reverse-Engineering AI Internals

  • Mechanistic interpretability seeks to reverse-engineer how neural networks produce outputs.
  • Understanding internal algorithms could reveal hidden failure modes before deployment.
INSIGHT

Layerwise Sparse Autoencoders Reveal Features

  • DeepMind ran sparse autoencoders on each model layer to surface interpretable features.
  • Varying sparsity revealed features at multiple granularities, balancing detail and interpretability.
ANECDOTE

Chihuahua Triggers a Dogs Feature

  • Prompting Gemma about a Chihuahua lights up a Dogs feature that represents dog-related knowledge.
  • DeepMind open-sourced the features to let researchers map how representations progress layer to layer.
Get the Snipd Podcast app to discover more snips from this episode
Get the app