undefined

Neel Nanda

Author of the LessWrong post on AI interpretability. Known for his personal hot takes on AI safety and deception.

Top 3 podcasts with Neel Nanda

Ranked by the Snipd community
undefined
152 snips
Jun 18, 2023 • 4h 10min

Neel Nanda - Mechanistic Interpretability

Neel Nanda, a researcher at DeepMind specializing in mechanistic interpretability, dives into the intricate world of AI models. He discusses how models can represent thoughts through motifs and circuits, revealing the complexities of superposition where models encode more features than neurons. Nanda explores the fascinating idea of whether models can possess goals and highlights the role of 'induction heads' in tracking long-range dependencies. His insights into the balance between elegant theories and the messy realities of AI add depth to the conversation.
undefined
124 snips
Dec 7, 2024 • 3h 43min

Neel Nanda - Mechanistic Interpretability (Sparse Autoencoders)

Neel Nanda, a senior research scientist at Google DeepMind, leads the mechanistic interpretability team. At just 25, he explores the complexities of neural networks and the role of sparse autoencoders in AI safety. Nanda discusses challenges in understanding model behaviors, such as reasoning and deception. He emphasizes the need for deeper insights into the internal structures of AI to enhance safety and interpretability. The conversation also touches on innovative techniques for generating meaningful features and navigating mechanistic interpretability.
undefined
May 5, 2025 • 13min

“Interpretability Will Not Reliably Find Deceptive AI” by Neel Nanda

Neel Nanda, a thought leader on AI safety, shares his intriguing insights on interpretability and its limits. He argues that relying solely on interpretability to detect deceptive AI is naive. Instead, he advocates for a multi-faceted defense strategy that includes black-box methods alongside interpretability. Nanda emphasizes that while interpretability can enhance our understanding, it's just one layer in ensuring AI safety. His hot takes spark a provocative discussion on the challenges we face with superintelligent systems.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app