The Inside View

Neel Nanda on mechanistic interpretability, superposition and grokking

4 snips
Sep 21, 2023
Neel Nanda, a researcher at Google DeepMind, discusses mechanistic interpretability in AI, induction heads in models, and his journey into alignment. He explores scalable oversight, the ambitious degree of interpretability in transformer architectures, and the capability of humans to understand complex models. The podcast also covers linear representations in neural networks, the concept of superposition in models and features, Terry Matt's mentorship program, and the importance of interpretability in AI systems.
Ask episode
Chapters
Transcript
Episode notes