TalkRL: The Reinforcement Learning Podcast cover image

Arash Ahmadian on Rethinking RLHF

TalkRL: The Reinforcement Learning Podcast

00:00

Exploring Reward Structures and Optimization Techniques in Reinforcement Learning

The chapter delves into discussions around reward structures in reinforcement learning, comparing different optimization techniques like banded formulation and vanilla policy gradient. It explores the impact of these discussions on preference training at a specific company and highlights the importance of curating reward signals for better optimization in RLHF.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app