
Arash Ahmadian on Rethinking RLHF
TalkRL: The Reinforcement Learning Podcast
Debate on Token-Level Actions vs Entire Completion as a Single Action in RL Text Generation
Exploring the debate on whether text generation should treat tokens as separate actions or the entire completion as one action. The chapter discusses how RL models handle tokens as actions and the implementation of PPO in RLHF.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.