AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Reward Actions, Not Actions of the Agent
You need to symmetrically subtract points for going away from the nation if you're rewarding them for going towards the destination. So really, what you're rewarding is the agent's position, not their behavior. I think this is a very deep result that has implications, not just in a i but for thinking about human incentives and parenting and things like this. Ho increasingly, people like stuart russell are making the argument that we just shouldn't manually design rewards at all because we just have too bad a track record. A hum so you have to have to fick us on outcomes, not process. Yes.