
#66 – Michael Cohen on Input Tampering in Advanced RL Agents
Hear This Idea
00:00
The Plausibility of Reward Maximizing Behavior
I think one thing that I might be getting confused about, maybe this is like a dumb question, but it sounds like this all makes sense if you imagine that these advanced agents like quote unquote care about maximizing something. Well, what model of the world is that reward maximizing behavior? I don't think there is one that would have retrodict the past data. So yeah, if you have a myopic agent, you can say that assumption forward fail. That being that the cost to experimenting is small, because now experiments take up a larger fraction of the time that you care about.
Play episode from 01:01:01
Transcript


