
Episode 19: Minqi Jiang, UCL, on environment and curriculum design for general RL agents
Generally Intelligent
00:00
Using Self Play in Multi-Agent Learning
Anothrithm was inspired by a paper called alphastar from deep mind. They basically boot strapped s of the agent using supervised game play data from humans. And then they basically pre train on human data this way, and then they use r l to fined toothing agents. The idea is that if you were just a bootstrap off of the human behavior clone policy, and then find tune at using self play.
Transcript
Play full episode