
#3: Bandits and Simulators for Recommenders with Olivier Jeunen
Recsperts - Recommender Systems Experts
Reward Engineering: An Open Problem for the Rexel Space
In reinforcement learning you have a certain plan to get to a reward. In banded learning, the reward is going to be somewhat instantaneously. We don't want to just optimize for short-term results and increase clickbaiting effect. For long-term rewards, I would say that reinforcement learning is the main tool that you need to use.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.