AXRP - the AI X-risk Research Podcast cover image

24 - Superalignment with Jan Leike

AXRP - the AI X-risk Research Podcast

00:00

How to Train a System to Succeed

The goal here is not to have the system like stress tests our cyber security or something although we should also do that separately I think that's like another effort right. The goal is really just like how close are the systems that we currently training or that we currently have to a system like that that would be deceptively aligned there is a coherent layer that any chance it gets where things humans aren't looking will run specific code  that's exactly what we're looking for and apro-aree they can set up the experiment so that it's hard to do that but you can very well measure whether the system succeeds yeah.

Play episode from 01:00:02
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app