
Ajeya Cotra on how Artificial Intelligence Could Cause Catastrophe
Future of Life Institute Podcast
The Future of Interpretability
interpretability plays a pretty big role right now I think we don't have super great interpretability abilities. Holden Karnofsky has a blog post out on how we might align transformative AI that's built really soon and it discusses a number of ideas. We could train models to kind of distill what that interpretability says you know if we have some slow procedure for looking at a model's weights with a bunch of humans then we can potentially dramatically speed up interpretability. That's it for this episode on the next episode I talk with Ajaya about how to think clearly in fast-moving worlds whether the pace of change is accelerating or not.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.