Hear This Idea cover image

#66 – Michael Cohen on Input Tampering in Advanced RL Agents

Hear This Idea

00:00

The Limits of Goal Misgeneralization

The agent should try whichever it thinks is most plausible, and then immediately update its behavior according to how that went for it. All the more reason to expect that agents that don't see reward when they're deployed will be of limited use in some contexts. But that would be a rational behavior. And I expect that Agents that are irrational enough to become convinced of certain hypotheses that were in their past indistinguishable will be damaging to their capabilities.

Play episode from 01:39:10
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app