The Core Problem of Alignment in RLETF

The core problem of alignment is how do you get a very complex powerful system that you don't understand to reliably do something complicated in domains where you cannot supervise it? RLETF does not address this problem. It doesn't even claim to suggest this problem. There's no reason to expect that RLETF should solve this problem. This is like, it's like, you know, clicker training in a alien.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app