
Ep. 129: Applying the 'security mindset' to AI and x-risk | Jeffrey Ladish
FUTURATI PODCAST
00:00
The Future of Alignment Research
I get the sense that alias or just isn't super on board with any of them, and they have all that they all have a bunch of kind of obvious failure modes. I don't feel like anyone has proposed something that's like, yes, this approach could really work. Work I'm excited about is one of them one of the areas is just interpretability or like mechanistic interpretability also anomaly detection. Paul Christiano is like doing a bunch of stuff at arc. That I think it's pretty interesting. Yeah, we're ready for sure.
Transcript
Play full episode