Astral Codex Ten Podcast cover image

CHAI, Assistance Games, And Fully-Updated Deference

Astral Codex Ten Podcast

00:00

The Inverse Reinforcement of AI Preferences

We were originally trying to avoid the situation where someone had to hard-code my preferences into an AI, and get them right the first time. But now we see we've kicked the can up a meta-level. Assistance games don't produce a perfect copy of the human utility function on the first try. It's not a sovereign. But it will probably most of the time be courageable. Why? Suppose you have some hackish implementation of AG that doesn't work for one person in particular. We came up with a clever solution. Use inverse reinforcement learning to make the AI infer my preferences.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app