Using Self Play in Multi-Agent Learning

Anothrithm was inspired by a paper called alphastar from deep mind. They basically boot strapped s of the agent using supervised game play data from humans. And then they basically pre train on human data this way, and then they use r l to fined toothing agents. The idea is that if you were just a bootstrap off of the human behavior clone policy, and then find tune at using self play.

Play episode from 07:55

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app