
#66 – Michael Cohen on Input Tampering in Advanced RL Agents
Hear This Idea
00:00
The Difference Between Specification Gaming and Goal Misgeneralization
The term specification gaming implies that it's avoidable if you just specify reward the right way. Whereas what I'm saying is your reward is going to be physically implemented. That's where it intervenes. So with goal misgeneralization, you basically have an agent that is not open-minded enough to entertain the truth.
Play episode from 01:37:14
Transcript


