
[07] John Schulman - Optimizing Expectations: From Deep RL to Stochastic Computation Graphs
The Thesis Review
00:00
Reinforcement Learning in Dota: The OpenAI 5 Journey
This chapter explores the OpenAI 5 project, detailing its use of large-scale reinforcement learning and self-play to achieve victories in Dota. It discusses the evolution of gameplay from one-on-one matches to full team dynamics, highlighting the effectiveness of the PPO algorithm and the challenges of aligning reward systems in model-free approaches.
Transcript
Play full episode