
#66 – Michael Cohen on Input Tampering in Advanced RL Agents
Hear This Idea
00:00
Rambo: A Practical Pessimistic Agent
I've done some work on an imitation learner that works this way and the remaining probability mass it just asks for more demonstrations. This enables it to eventually learn basically by getting more data in the context where it's the least sure. So if you use something like that instead of the first sort of imitation learning you'd think of, you might be able to overcome this. You get it to be somewhat more unimaginative than would be rational.
Play episode from 02:20:50
Transcript


