How to Model Human Bias in a Logical Environment

i think in this case it was actually that the transition dynamics were the same across all the environments, and the value crition metork was allowed to learn a warped version of them. This is more like, when we came up with our model of what the human human planer was doing, we put into it this, like, incorrect model of how the world works. So that is still a difference, but it isn'tlike we learned a planer that gets the correct transition dynamics and then works them a like that. Ah, or possibly i just said a wrong thing earlier. I mean, imean takin about, you've got af of some something ther,. ye?

Play episode from 30:23

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app