AXRP - the AI X-risk Research Podcast cover image

24 - Superalignment with Jan Leike

AXRP - the AI X-risk Research Podcast

00:00

How to Scale Up an Automated Alignment Research Model

The idea is you want the models to not be too good at these like scary tasks. And so one might think that combination of things, that's inherently scarier dangerous. But I think ultimately this is an empirical question, right? Like it's kind of really difficult to know in which order, like which skills gets, get unlocked if when you scale up the models.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app