Inter Active Reward Learning

Inversoro paradime assumes that we were not uncertain in the end. So there can be uncertainty, but like the human isn't in the environment. The inter active reward learning setting is mostly just a thing we made up because it was a thing that people often brought up as like, maybe this will work. We wanted to talk about it and show why it doesn't, doesn't, in fact works.

Play episode from 43:59

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app