AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Using Offline RL for Personalization
We're now thinking of our business more about building a journey rather than getting you to just click on something with a band it. We use these techniques from baby RL, which is multi arm bandits. This is like if you have a forgetful RL agent that doesn't know what state it's in then basically to band it. It turns out RL is also about getting a user to go on a journey and discover new things and enrich the way they use modifying their day to day life. And so you can't just think of it as a multi arm banded in the casino, which kind of doesn't really remember what happened before. You have to really understand users are impacted by your recommendations