

NeurIPS 2024 - Posters and Hallways 1
Mar 3, 2025
This discussion dives into innovative methods for unsupervised skill discovery in hierarchical reinforcement learning, using driving as a practical example. It also tackles trust issues in Proximal Policy Optimization and introduces Time-Constrained Robust MDPs for improved performance. Sustainability in supercomputing is highlighted, showcasing AI's role in reducing energy consumption. Additionally, there's a focus on standardizing multi-agent reinforcement learning for better control and optimizing exploration strategies when rewards are not easily visible.
AI Snips
Chapters
Transcript
Episode notes
Structured Skill Spaces for Hierarchical RL
- Unsupervised skill discovery methods struggle with skill repurposing for downstream tasks.
- Injecting structure into the skill space, like linking skills to specific state dimensions, improves downstream task learning.
Car Example for Disentangled Skills
- Imagine a car where one skill controls velocity and another controls orientation.
- This structured skill space simplifies downstream task learning compared to an entangled skill space.
Representation Collapse in PPO
- PPO's trust region can fail due to deteriorating representations, which collapse and make states indistinguishable.
- Regularizing representations, like with PFO, maintains trust region effectiveness and mitigates performance collapse.