
Arash Ahmadian on Rethinking RLHF
TalkRL: The Reinforcement Learning Podcast
Exploring Reward Structures and Optimization Techniques in Reinforcement Learning
The chapter delves into discussions around reward structures in reinforcement learning, comparing different optimization techniques like banded formulation and vanilla policy gradient. It explores the impact of these discussions on preference training at a specific company and highlights the importance of curating reward signals for better optimization in RLHF.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.