
#66 – Michael Cohen on Input Tampering in Advanced RL Agents
Hear This Idea
00:00
The Distal Model and the Proximal Model of the World
If the agent never interrupts this information channel, I think you should say they're both equally correct. But in a world where that's on the table, if it does experiment with these, then it would end up favoring one over the other. The model that says I get reward when I win a chess strikes me as way, way simpler than the model that saying I'm on planet earth. This could succeed or fail for different agents differently,. Depending on the way that we're giving it reward and the environment that it's in.
Play episode from 41:04
Transcript


