AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

CHAPTER

Is Mechanistic Interpretability the Only Path to Understanding Neural Networks?

I would really prefer to live in a world where we actually understand these things. And I think that mechanistic interpretability is not obviously the only path to get to a point where we can make some claim to understand systems. But it's a promising path that achieves a pretty high level of rigor and reliability and understanding. The more we can claim that we actually understand what's going on inside these models, the better off we'll be.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner