TalkRL: The Reinforcement Learning Podcast cover image

Arash Ahmadian on Rethinking RLHF

TalkRL: The Reinforcement Learning Podcast

00:00

Debate on Token-Level Actions vs Entire Completion as a Single Action in RL Text Generation

Exploring the debate on whether text generation should treat tokens as separate actions or the entire completion as one action. The chapter discusses how RL models handle tokens as actions and the implementation of PPO in RLHF.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app