
#66 – Michael Cohen on Input Tampering in Advanced RL Agents
Hear This Idea
00:00
The Semantic Errors in Reward
It does seem like there is a sense in which it is a mistake to just anthropomorphize RL agents as always quote unquote wanting reward and having some understanding that that's the context. And so I think one upshot here is like you don't just get it for free or by magic that even advanced agents will just be like instantly thinking, hey, how do I get this thing called reward? Yeah.
Play episode from 01:29:53
Transcript


