Recsperts - Recommender Systems Experts cover image

#3: Bandits and Simulators for Recommenders with Olivier Jeunen

Recsperts - Recommender Systems Experts

NOTE

Distinguishing factors between reinforcement learning and bandit learning for defining rewards

The main difference between reinforcement learning and bandit learning is that in reinforcement learning, rewards are attributed to a sequence of actions taken in the past to achieve a certain plan, whereas in bandit learning, rewards are instantaneous and only attributed to the action taken at that moment. In reinforcement learning, it is crucial to learn which actions lead to a good result over the long term, unlike in bandit learning where rewards are tied to immediate actions. This distinction is important when considering the application of reinforcement learning to problems such as long-term user engagement or relevance, as it helps avoid optimizing for short-term outcomes that may lead to clickbaiting effects.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner