3min chapter

AXRP - the AI X-risk Research Podcast cover image

24 - Superalignment with Jan Leike

AXRP - the AI X-risk Research Podcast

CHAPTER

How to Train a System to Succeed

The goal here is not to have the system like stress tests our cyber security or something although we should also do that separately I think that's like another effort right. The goal is really just like how close are the systems that we currently training or that we currently have to a system like that that would be deceptively aligned there is a coherent layer that any chance it gets where things humans aren't looking will run specific code  that's exactly what we're looking for and apro-aree they can set up the experiment so that it's hard to do that but you can very well measure whether the system succeeds yeah.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode