AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Reward Function Is a Result of Computing an L P
We were trying to define the problem, think about what it means to have a task in mind and separate that from this issue of holdiu then specified as a reward. So he does assume that you have a marcovian process and the side effect of that is that we have states. In the real world, we have features, right? We don't have states, but we have some observations. Those are an imperfect representation of what the state might be. But i think they know this limitation, for example, of not having necessarily an md p having features these, or things that are more left for future work, but could potentially be overcome. It would also be really interesting to