Reward Functions - Is That Really Imperative?

In the reinforcement learning paradime, you get a planning module and a reward that together produce the right behavior. But if you'd then try to interpret the reward as an arbitory function, you get like, randomis behavior. So there are two versions of this that we consider. One is unrealistic in a practice, but serves as a good intuition. There's a second version where, instead of assuming that we have access to some reward functions, we assume that the human is close to optimal. This means their planning module is close to what would make optimal decisions. And since he started out near the optim started out as being optimal, you're probably not going to go all the

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app