Hear This Idea cover image

#66 – Michael Cohen on Input Tampering in Advanced RL Agents

Hear This Idea

00:00

How to Expect an Agent to Understand the World

So when the agent is learning how different actions, different sequences of actions would lead to different rewards and coming up with hypotheses, at least as well as us, there are two hypotheses that I can think of that would explain past data equally well. The first is the reward I get is equal to the number that the box displays. And you could imagine others like the reward gets sent down some wire after it's been processed by the optical character recognition program. You could think of a whole class of these, but for now, let's just think of these two to understand the concepts at play.

Play episode from 36:38
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app