
LessWrong (Curated & Popular) “Gradient Routing: Masking Gradients to Localize Computation in Neural Networks” by cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout
Dec 9, 2024
Dive into the fascinating world of gradient routing, a technique that controls learning in neural networks by applying masks to gradients. Discover how it can lead to safer AI systems by enabling transparency and oversight. Learn about its implementation in splitting latent spaces for distinct digit recognition and the localization of computation in language models. The discussion also touches on robust unlearning and the importance of scalable oversight, showcasing the potential of specialized AI.
AI Snips
Chapters
Transcript
Episode notes
Gradient Routing
- Gradient routing controls where learning happens in neural networks by masking gradients during backpropagation.
- Different masks for different data points create specialized subcomponents within a model.
MNIST Latent Space Splitting
- An MNIST autoencoder was trained with gradient routing to split its latent space.
- Digits 0-4 were routed through one half, and 5-9 through the other, demonstrating specialization.
Steering Scalar
- Routing the token "California" to a specific dimension in a language model localized related features.
- This showed that gradient routing can steer the learning of specific features in a desired direction.
