How to Align Super Intelligence

So, so scalable oversight is an example of the thing that you could build off of reinforcement learning from human feedback often called RLHF. And I think it would be really exciting if we can have some formal verification in there. We like, we figured out like some kind of learning algorithms that has statistical guarantees. Like, I don't know what even would be possible here and threadically feasible. If you have a lot of cognitive labor that you can't throw at the problem. But all of these things are very different fromlike the kind of things that we would do right now or like that we woulddo next.

Transcript

Play full episode

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app