The Future of Interpretability

interpretability plays a pretty big role right now I think we don't have super great interpretability abilities. Holden Karnofsky has a blog post out on how we might align transformative AI that's built really soon and it discusses a number of ideas. We could train models to kind of distill what that interpretability says you know if we have some slow procedure for looking at a model's weights with a bunch of humans then we can potentially dramatically speed up interpretability. That's it for this episode on the next episode I talk with Ajaya about how to think clearly in fast-moving worlds whether the pace of change is accelerating or not.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app