AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Reward Engineering: An Open Problem for the Rexel Space
In reinforcement learning you have a certain plan to get to a reward. In banded learning, the reward is going to be somewhat instantaneously. We don't want to just optimize for short-term results and increase clickbaiting effect. For long-term rewards, I would say that reinforcement learning is the main tool that you need to use.