Rambo: A Practical Pessimistic Agent

I've done some work on an imitation learner that works this way and the remaining probability mass it just asks for more demonstrations. This enables it to eventually learn basically by getting more data in the context where it's the least sure. So if you use something like that instead of the first sort of imitation learning you'd think of, you might be able to overcome this. You get it to be somewhat more unimaginative than would be rational.

Play episode from 02:20:50

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app