undefined

Neel Nanda

Final year maths undergraduate at the University of Cambridge and gold medalist in the International Mathematical Olympiad, active member in rationalist and effective altruism communities.

Top 5 podcasts with Neel Nanda

Ranked by the Snipd community
undefined
152 snips
Jun 18, 2023 • 4h 10min

Neel Nanda - Mechanistic Interpretability

In this wide-ranging conversation, Tim Scarfe interviews Neel Nanda, a researcher at DeepMind working on mechanistic interpretability, which aims to understand the algorithms and representations learned by machine learning models. Neel discusses how models can represent their thoughts using motifs, circuits, and linear directional features which are often communicated via a "residual stream", an information highway models use to pass information between layers. Neel argues that "superposition", the ability for models to represent more features than they have neurons, is one of the biggest open problems in interpretability. This is because superposition thwarts our ability to understand models by decomposing them into individual units of analysis. Despite this, Neel remains optimistic that ambitious interpretability is possible, citing examples like his work reverse engineering how models do modular addition. However, Neel notes we must start small, build rigorous foundations, and not assume our theoretical frameworks perfectly match reality. The conversation turns to whether models can have goals or agency, with Neel arguing they likely can based on heuristics like models executing long term plans towards some objective. However, we currently lack techniques to build models with specific goals, meaning any goals would likely be learned or emergent. Neel highlights how induction heads, circuits models use to track long range dependencies, seem crucial for phenomena like in-context learning to emerge. On the existential risks from AI, Neel believes we should avoid overly confident claims that models will or will not be dangerous, as we do not understand them enough to make confident theoretical assertions. However, models could pose risks through being misused, having undesirable emergent properties, or being imperfectly aligned. Neel argues we must pursue rigorous empirical work to better understand and ensure model safety, avoid "philosophizing" about definitions of intelligence, and focus on ensuring researchers have standards for what it means to decide a system is "safe" before deploying it. Overall, a thoughtful conversation on one of the most important issues of our time. Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 Twitter: https://twitter.com/MLStreetTalk Neel Nanda: https://www.neelnanda.io/ TOC [00:00:00] Introduction and Neel Nanda's Interests (walk and talk) [00:03:15] Mechanistic Interpretability: Reverse Engineering Neural Networks [00:13:23] Discord questions [00:21:16] Main interview kick-off in studio [00:49:26] Grokking and Sudden Generalization [00:53:18] The Debate on Systematicity and Compositionality [01:19:16] How do ML models represent their thoughts [01:25:51] Do Large Language Models Learn World Models? [01:53:36] Superposition and Interference in Language Models [02:43:15] Transformers discussion [02:49:49] Emergence and In-Context Learning [03:20:02] Superintelligence/XRisk discussion Transcript: https://docs.google.com/document/d/1FK1OepdJMrqpFK-_1Q3LQN6QLyLBvBwWW_5z8WrS1RI/edit?usp=sharing Refs: https://docs.google.com/document/d/115dAroX0PzSduKr5F1V4CWggYcqIoSXYBhcxYktCnqY/edit?usp=sharing
undefined
25 snips
Apr 19, 2020 • 57min

#9 – Neel Nanda on Effective Planning and Building Habits that Stick

Neel Nanda is a final year maths undergraduate at the University of Cambridge, and a gold medalist in the International Mathematical Olympiad. He teaches regularly – from revision lectures to a recent ‘public rationality’ workshop. Neel is also an active member in rationalist and effective altruism communities. In this episode we discuss How to view self-improvement and optimising your goals Forming good habits through the 'TAPs' technique How to build effective plans by using our 'inner simulator' and 'pre-hindsight' You can read more on this episode's accompanying write-up: hearthisidea.com/episodes/neel. You can also read Neel's teaching notes for his planning workshop here. If you have any feedback or suggestions for future guests, please get in touch through our website. Also, Neel has created an anonymous feedback form for this episode, and he would love to hear any of your thoughts! Please also consider leaving a review on Apple Podcasts or wherever you're listening to this; we're just starting out and it would really help listeners find us! If you want to support the show more directly, you can also buy us a beer at tips.pinecast.com/jar/hear-this-idea. Thanks for listening!
undefined
4 snips
Sep 21, 2023 • 2h 5min

Neel Nanda on mechanistic interpretability, superposition and grokking

Neel Nanda, a researcher at Google DeepMind, discusses mechanistic interpretability in AI, induction heads in models, and his journey into alignment. He explores scalable oversight, the ambitious degree of interpretability in transformer architectures, and the capability of humans to understand complex models. The podcast also covers linear representations in neural networks, the concept of superposition in models and features, Terry Matt's mentorship program, and the importance of interpretability in AI systems.
undefined
Jan 20, 2024 • 41min

[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka

Neel Nanda, an expert in mechanistic interpretability, discusses the challenges and potential applications of mechanistic interpretability. They explore concrete projects, debunk the usefulness of mechanistic interpretability, and discuss the limitations in achieving interpretability in transformative models like GPT-4. They also delve into the concept of model safety and ablations, and discuss the potential of ruling out problematic behavior without fully understanding the model's internals. The speakers reflect on the dialogue and highlight its usefulness in advancing thinking about mechanistic interpretability.
undefined
Jan 11, 2021 • 1h 44min

How can we optimise for a meaningful life?

In this episode, we're joined by Neel Nanda to discuss life optimisations, career choices, how to live in alignment with your own values and how to treat your relationships more intentionally. We discuss some of the principles behind how we can do the most good with our lives, while also being fulfilled and enjoying the journey.Links80000 Hours Key IdeasMisconceptions about EA Map of local EA groupsNeel's Blog Neel's Top PostsNeel's Post on Excitement & Motivation If you want to read more about the ideas behind EA, Neel suggested these two books:Doing Good BetterThe Precipice------------------------------------------------------Sponsored by BrilliantThis episode is kindly supported by Brilliant, the best way to learn maths, science, and computer science online. Brilliant focuses on helping you learn how to think, rather than just memorising methods and facts. Sign up at https://brilliant.org/notoverthinking — the first 100 people get 20% off an annual subscription.Join our Membership Community ThingIf you'd like to potentially join the Not Overthinking membership community thing (we discuss it halfway through this episode), please fill out your info here - https://airtable.com/shr1UZc8sWnhAEwJgLeave us a ReviewIf you enjoy listening to the podcast, we'd love for you to leave us a review on iTunes / Apple Podcasts. Here's a link that works even if you're not on an iPhone :) Send us an Audio MessageWe really want to include more listener comments and questions in our episodes. If you've got any thoughts on this episode, or if you've got a conundrum or question you'd like us to discuss, send an audio file / voice note to hi@notoverthinking.com. For any non-audio comments, drop us a tweet or DM on Twitter - https://twitter.com/noverthinking See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.