How Do You Train a Behavior Cloning Algorithm?

This is just a very classic sort of policy computing algorithm that you might know from game theory papers or something like this. The anchor policies are what keeps the strategy in at a human level. And it's a bit tricky to explain just here, but essentially, if you let the model just do reinforcement learning, just do sort of computational planning up here, you quickly get into a state where the actions become like non human.

Play episode from 16:23

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app