The Future of Alignment Research

I get the sense that alias or just isn't super on board with any of them, and they have all that they all have a bunch of kind of obvious failure modes. I don't feel like anyone has proposed something that's like, yes, this approach could really work. Work I'm excited about is one of them one of the areas is just interpretability or like mechanistic interpretability also anomaly detection. Paul Christiano is like doing a bunch of stuff at arc. That I think it's pretty interesting. Yeah, we're ready for sure.

Play episode from 48:14

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app