Reward Engineering: An Open Problem for the Rexel Space

In reinforcement learning you have a certain plan to get to a reward. In banded learning, the reward is going to be somewhat instantaneously. We don't want to just optimize for short-term results and increase clickbaiting effect. For long-term rewards, I would say that reinforcement learning is the main tool that you need to use.

Play episode from 35:21

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app