Reward Actions, Not Actions of the Agent

You need to symmetrically subtract points for going away from the nation if you're rewarding them for going towards the destination. So really, what you're rewarding is the agent's position, not their behavior. I think this is a very deep result that has implications, not just in a i but for thinking about human incentives and parenting and things like this. Ho increasingly, people like stuart russell are making the argument that we just shouldn't manually design rewards at all because we just have too bad a track record. A hum so you have to have to fick us on outcomes, not process. Yes.

Play episode from 29:38

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app