AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Model Reinforcement Learning
The main thing that I have been doing in my papers is focusing on, we will call it a click for now. It doesn't really matter if it's really a click or whether it's a different signal. And I think it's very important to think about what you define your reward to be. But so if we make, let's say, a slight tick mark behind simulator, even though we maybe need to investigate a bit more, which assumptions one needs with regards to the simulator. Let's maybe shift or focus to the second component to the reward or to the reward generating process. How are you modeling this and approaching the problem of granting proper reward to the mechanism that tries to learn