AXRP - the AI X-risk Research Podcast cover image

24 - Superalignment with Jan Leike

AXRP - the AI X-risk Research Podcast

00:00

How to Scale Up an Automated Alignment Research Model

The idea is you want the models to not be too good at these like scary tasks. And so one might think that combination of things, that's inherently scarier dangerous. But I think ultimately this is an empirical question, right? Like it's kind of really difficult to know in which order, like which skills gets, get unlocked if when you scale up the models.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app