
#66 – Michael Cohen on Input Tampering in Advanced RL Agents
Hear This Idea
00:00
The Myopia of Bazoo
The rate at which you can train like a useful agent? And also, I guess the kind of how capable the agent can eventually become. Bazoo is going to be like a function of just how long the like periods of training are. So it can be trained on all of the separate episodes. It doesn't have to like start from zero knowledge at the beginning of every episode and learn where it is, who it's talking to, what planet it's on. When it's coming up with a model of how its actions produce or don't produce reward, it can use all its prior episodes as information to refine its models. And it can do that without caring about the reward from the
Play episode from 02:00:27
Transcript


