4min chapter

Bankless cover image

168 - How to Solve AI Alignment with Paul Christiano

Bankless

CHAPTER

The Limits of AI Deception

If we don't code up these systems the ais will naturally like find a loophole. If that loophole allows for the ais to rate themselves highly and give themselves a reward, that's what they're going to do. The last step is where the AI you know colludes with another AI to sort of fudge the numbers. Is there no way to protect against that? Paul: I think it was genuinely unknown It's an open empirical question if you trained an AI system to get a lot of reward And you train in a bunch of cases where being dishonest always failed in practice.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode