TalkRL: The Reinforcement Learning Podcast cover image

NeurIPS 2024 - Posters and Hallways 1

TalkRL: The Reinforcement Learning Podcast

00:00

Exploring Trust in PPO and Time Constraints in Robust MDPs

This chapter explores the challenges of representation trust in reinforcement learning, particularly through the lens of Proximal Policy Optimization (PPO) and its potential performance decline due to non-stationarity. It introduces the PFO method for improving representation stability and discusses Time Constraint Robust MDPs to overcome conservativeness while enhancing average performance.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app