
[07] John Schulman - Optimizing Expectations: From Deep RL to Stochastic Computation Graphs
The Thesis Review
00:00
Exploring Stochastic Computation in RL
This chapter examines stochastic computation graphs and their role in reinforcement learning, focusing on gradient optimization involving random variables. It discusses advanced topics like Trust Region Policy Optimization (TRPO) and the transition to Proximal Policy Optimization (PPO), addressing the balance between incremental updates and policy training effectiveness. The conversation also emphasizes the importance of practical implementation alongside theoretical foundations, reflecting on the methodologies that evolved through practical trials.
Transcript
Play full episode