
8 - Assistance Games with Dylan Hadfield-Menell
AXRP - the AI X-risk Research Podcast
00:00
The Incentive for Oversight
The goal of the game is to understand the system's incentive to bypass oversight ultimately. And so that's why you have this little bit weird thing of Am, the robot choosing to allow the person to turn it off. So th the key thing in this model that creates an incentive for oversight is that the off event is correlated with the action that you are about to take being bad or at least worse than turning off.
Play episode from 57:55
Transcript


