AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Can You Crate a Similator With Perfect Rules?
We're trying to apply that concept of optimizing for long term behavior. We found it useful to try to cast the entire mendation problem as a reinforcement learning problem. So while in the similator its the reward is literally how far you get down this track list, on the sort of mental level, i think long term retention is a good reward to look at. And what is interesting is that you really do see what you expect. It turns out that what you candof hope you see even if you optimize for lower engagement in one moment, you longer term retention can be more effective. From an unexpected point of vew I mean, you might expect your algaritm at any