Latent Space: The AI Engineer Podcast cover image

RLHF 201 - with Nathan Lambert of AI2 and Interconnects

Latent Space: The AI Engineer Podcast

00:00

Insights on Process Reward Models and Human-Centric RL in NLP

The process reward models reward each step in the chain of thought reasoning, providing more granularity for considering different states. There is ongoing debate about whether chain of thought reasoning is more like reinforcement learning (RL). The comparison of pre-deep RL versus deep RL shows that the current work in NLP originated from outside of NLP and before the prevalence of deep learning. Human-centric RL involves having a human give a score as a reward for an agent's action, rather than having a predefined reward function.

Play episode from 15:30
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app