The Importance of Intrinsic Interpretability in AI Safety

There are a lot of nice properties that like intrinsic interpretability techniques can add to neural nets. And there are lots of different techniques that don't conflict with using each other. I think it might be very interesting, you know, sometime in the near future to just kind of work on more intrinsically interpretable architectures as a stepping stone to try to do better mechanistic interpretability in the future.

Transcript

Play full episode

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app