

Neel Nanda
Author of the LessWrong post on AI interpretability. Known for his personal hot takes on AI safety and deception.
Top 3 podcasts with Neel Nanda
Ranked by the Snipd community

152 snips
Jun 18, 2023 • 4h 10min
Neel Nanda - Mechanistic Interpretability
Neel Nanda, a researcher at DeepMind specializing in mechanistic interpretability, dives into the intricate world of AI models. He discusses how models can represent thoughts through motifs and circuits, revealing the complexities of superposition where models encode more features than neurons. Nanda explores the fascinating idea of whether models can possess goals and highlights the role of 'induction heads' in tracking long-range dependencies. His insights into the balance between elegant theories and the messy realities of AI add depth to the conversation.

124 snips
Dec 7, 2024 • 3h 43min
Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders)
Neel Nanda, a senior research scientist at Google DeepMind, leads the mechanistic interpretability team. At just 25, he explores the complexities of neural networks and the role of sparse autoencoders in AI safety. Nanda discusses challenges in understanding model behaviors, such as reasoning and deception. He emphasizes the need for deeper insights into the internal structures of AI to enhance safety and interpretability. The conversation also touches on innovative techniques for generating meaningful features and navigating mechanistic interpretability.

May 5, 2025 • 13min
“Interpretability Will Not Reliably Find Deceptive AI” by Neel Nanda
Neel Nanda, a thought leader on AI safety, shares his intriguing insights on interpretability and its limits. He argues that relying solely on interpretability to detect deceptive AI is naive. Instead, he advocates for a multi-faceted defense strategy that includes black-box methods alongside interpretability. Nanda emphasizes that while interpretability can enhance our understanding, it's just one layer in ensuring AI safety. His hot takes spark a provocative discussion on the challenges we face with superintelligent systems.