The Importance of Interpretability

There are several stories you can tell about interpretability. Maybe the easiest one is something like we can detect when it's lying to us. And traditionally, there's this field of mechanistic interpretability which is trying to reverse engineer what's going on inside of neural networks. Ideally, you could write down a program in a programming language after you've done mechanisticinterpretability and all of a sudden have a very interpretable model.

Play episode from 18:52

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app