AXRP - the AI X-risk Research Podcast cover image

19 - Mechanistic Interpretability with Neel Nanda

AXRP - the AI X-risk Research Podcast

00:00

The Science of Deep Learning and Mechanistic Interpretability

There are two stories of why mechanistic interpretability could be useful. One is you want to do it so that you develop these mechanisticinterpretability tools. And the way you use them is one day you're going to train a model and you'll want to know whether it's like a good model or a bad model in terms of how it's thinking about stuff. So there's another story where you're like, okay, if I want to understand how like, I don't know what this, what value of squibble is going to be important. It can be a mistake to be too goal directed when trying to do basic science. Lots of things that get done will turn out

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app