AXRP - the AI X-risk Research Podcast cover image

24 - Superalignment with Jan Leike

AXRP - the AI X-risk Research Podcast

00:00

The Alignment Tax

The overall goal here is not to build the most capable automated alignment researcher that we cause possibly good with the tech that we have. But rather build something that is really, really useful that we can scale up a lot. And most importantly, that we trust is a line enough to hand off these tasks too. If there's a lot of if we introduce like a fair amount of inefficient inefficiencies in this process, right, like where we're essentially like sandbagging the model capabilities by training in this way? I don't think that matters as much.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app