Yannic Kilcher Videos (Audio Only) cover image

CICERO: An AI agent that negotiates, persuades, and cooperates with people

Yannic Kilcher Videos (Audio Only)

00:00

How Do You Train a Behavior Cloning Algorithm?

This is just a very classic sort of policy computing algorithm that you might know from game theory papers or something like this. The anchor policies are what keeps the strategy in at a human level. And it's a bit tricky to explain just here, but essentially, if you let the model just do reinforcement learning, just do sort of computational planning up here, you quickly get into a state where the actions become like non human.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app