

Google DeepMind has a new way to look inside an AI’s “mind”
67 snips Sep 10, 2025
Discover how autoencoders are allowing researchers to peek into the intricate workings of artificial intelligence. The podcast explores Google DeepMind's innovative Gemascope, designed to improve our understanding of AI decisions. This tool may also help mitigate bias and errors, paving the way for more reliable AI systems. Tune in to learn about the exciting potential of mechanistic interpretability!
AI Snips
Chapters
Transcript
Episode notes
Reverse-Engineering AI Internals
- Mechanistic interpretability seeks to reverse-engineer how neural networks produce outputs.
- Understanding internal algorithms could reveal hidden failure modes before deployment.
Layerwise Sparse Autoencoders Reveal Features
- DeepMind ran sparse autoencoders on each model layer to surface interpretable features.
- Varying sparsity revealed features at multiple granularities, balancing detail and interpretability.
Chihuahua Triggers a Dogs Feature
- Prompting Gemma about a Chihuahua lights up a Dogs feature that represents dog-related knowledge.
- DeepMind open-sourced the features to let researchers map how representations progress layer to layer.