AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Reward Functions - Is That Really Imperative?
In the reinforcement learning paradime, you get a planning module and a reward that together produce the right behavior. But if you'd then try to interpret the reward as an arbitory function, you get like, randomis behavior. So there are two versions of this that we consider. One is unrealistic in a practice, but serves as a good intuition. There's a second version where, instead of assuming that we have access to some reward functions, we assume that the human is close to optimal. This means their planning module is close to what would make optimal decisions. And since he started out near the optim started out as being optimal, you're probably not going to go all the