2min snip

Recsperts - Recommender Systems Experts cover image

#3: Bandits and Simulators for Recommenders with Olivier Jeunen

Recsperts - Recommender Systems Experts

NOTE

Distinguishing factors between reinforcement learning and bandit learning for defining rewards

The main difference between reinforcement learning and bandit learning is that in reinforcement learning, rewards are attributed to a sequence of actions taken in the past to achieve a certain plan, whereas in bandit learning, rewards are instantaneous and only attributed to the action taken at that moment. In reinforcement learning, it is crucial to learn which actions lead to a good result over the long term, unlike in bandit learning where rewards are tied to immediate actions. This distinction is important when considering the application of reinforcement learning to problems such as long-term user engagement or relevance, as it helps avoid optimizing for short-term outcomes that may lead to clickbaiting effects.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode