
John Schulman
TalkRL: The Reinforcement Learning Podcast
Generalization and Reward Models in Machine Learning
The prompt is, is guiding the model. It's like what corner of the internet do we want to imitate here? And maybe we want to instruct you. So I think generalization, yeah, I think language models generalize quite well. One of the tricky pieces about, uh, RL from human feedback is how, so, so you have this reward model and you're actually training against it.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.