
19 - Mechanistic Interpretability with Neel Nanda
AXRP - the AI X-risk Research Podcast
The Science of Deep Learning and Mechanistic Interpretability
There are two stories of why mechanistic interpretability could be useful. One is you want to do it so that you develop these mechanisticinterpretability tools. And the way you use them is one day you're going to train a model and you'll want to know whether it's like a good model or a bad model in terms of how it's thinking about stuff. So there's another story where you're like, okay, if I want to understand how like, I don't know what this, what value of squibble is going to be important. It can be a mistake to be too goal directed when trying to do basic science. Lots of things that get done will turn out
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.