Generally Intelligent cover image

Episode 19: Minqi Jiang, UCL, on environment and curriculum design for general RL agents

Generally Intelligent

00:00

Using Self Play in Multi-Agent Learning

Anothrithm was inspired by a paper called alphastar from deep mind. They basically boot strapped s of the agent using supervised game play data from humans. And then they basically pre train on human data this way, and then they use r l to fined toothing agents. The idea is that if you were just a bootstrap off of the human behavior clone policy, and then find tune at using self play.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app