TalkRL: The Reinforcement Learning Podcast cover image

John Schulman

TalkRL: The Reinforcement Learning Podcast

00:00

Generalization and Reward Models in Machine Learning

The prompt is, is guiding the model. It's like what corner of the internet do we want to imitate here? And maybe we want to instruct you. So I think generalization, yeah, I think language models generalize quite well. One of the tricky pieces about, uh, RL from human feedback is how, so, so you have this reward model and you're actually training against it.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app