AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Science of Deep Learning and Mechanistic Interpretability
There are two stories of why mechanistic interpretability could be useful. One is you want to do it so that you develop these mechanisticinterpretability tools. And the way you use them is one day you're going to train a model and you'll want to know whether it's like a good model or a bad model in terms of how it's thinking about stuff. So there's another story where you're like, okay, if I want to understand how like, I don't know what this, what value of squibble is going to be important. It can be a mistake to be too goal directed when trying to do basic science. Lots of things that get done will turn out