AXRP - the AI X-risk Research Podcast cover image

21 - Interpretability for Engineers with Stephen Casper

AXRP - the AI X-risk Research Podcast

00:00

The Importance of Intrinsic Interpretability in AI Safety

There are a lot of nice properties that like intrinsic interpretability techniques can add to neural nets. And there are lots of different techniques that don't conflict with using each other. I think it might be very interesting, you know, sometime in the near future to just kind of work on more intrinsically interpretable architectures as a stepping stone to try to do better mechanistic interpretability in the future.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app