
Goodhart's Law in Reinforcement Learning
Data Skeptic
00:00
How Do You Know if It's Raining?
Ges: I was expecting the optima strategies to be returned each time. The fact that they didn't for one of the learning methods was surprising and it 's not only that they would consistently not return. So deken would consistently return the naive policy, turn out ten times. And if i somehow figured out it's raining now, ad is going to get sunny, perhaps, depending on the reward structure, i should wait. These are the things we'd like to achieve. If you 're waiting for it to be sunny, you might have to wait a number of days. Once you have this quarter correlation of pressure, then the waiting strategy becomes a little less rewarding because you
Transcript
Play full episode