AXRP - the AI X-risk Research Podcast cover image

8 - Assistance Games with Dylan Hadfield-Menell

AXRP - the AI X-risk Research Podcast

00:00

The Incentive for Oversight

The goal of the game is to understand the system's incentive to bypass oversight ultimately. And so that's why you have this little bit weird thing of Am, the robot choosing to allow the person to turn it off. So th the key thing in this model that creates an incentive for oversight is that the off event is correlated with the action that you are about to take being bad or at least worse than turning off.

Play episode from 57:55
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app