
19 - Mechanistic Interpretability with Neel Nanda
AXRP - the AI X-risk Research Podcast
Is Mechanistic Interpretability the Only Path to Understanding Neural Networks?
I would really prefer to live in a world where we actually understand these things. And I think that mechanistic interpretability is not obviously the only path to get to a point where we can make some claim to understand systems. But it's a promising path that achieves a pretty high level of rigor and reliability and understanding. The more we can claim that we actually understand what's going on inside these models, the better off we'll be.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.