Data Skeptic cover image

Goodhart's Law in Reinforcement Learning

Data Skeptic

00:00

How Do You Know if It's Raining?

Ges: I was expecting the optima strategies to be returned each time. The fact that they didn't for one of the learning methods was surprising and it 's not only that they would consistently not return. So deken would consistently return the naive policy, turn out ten times. And if i somehow figured out it's raining now, ad is going to get sunny, perhaps, depending on the reward structure, i should wait. These are the things we'd like to achieve. If you 're waiting for it to be sunny, you might have to wait a number of days. Once you have this quarter correlation of pressure, then the waiting strategy becomes a little less rewarding because you

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app