AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Scale the Self-Play Process in Diplomacy
The way that we've approached self-play in diplomacy is like we're trying to come up with good intents to condition the language model on. We train a neural net to imitate the human data as closely as possible. And so what we do is we get a better approximation, a better model of human play by adding planning and RL. But it would have to be a really high expected value in order to deviate from this human-like policy.