AXRP - the AI X-risk Research Podcast cover image

24 - Superalignment with Jan Leike

AXRP - the AI X-risk Research Podcast

00:00

How to Align Super Intelligence

So, so scalable oversight is an example of the thing that you could build off of reinforcement learning from human feedback often called RLHF. And I think it would be really exciting if we can have some formal verification in there. We like, we figured out like some kind of learning algorithms that has statistical guarantees. Like, I don't know what even would be possible here and threadically feasible. If you have a lot of cognitive labor that you can't throw at the problem. But all of these things are very different fromlike the kind of things that we would do right now or like that we woulddo next.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app