TalkRL: The Reinforcement Learning Podcast cover image

John Schulman

TalkRL: The Reinforcement Learning Podcast

CHAPTER

Using Human Feedback to Define a Reward Function

I realized the next frontier was figuring out how to make language models actually useful. I'm still really interested in RL, but solving RL benchmarks isn't the end of the story. To use your RL algorithm, you need a reward function. How exactly do you define this reward becomes a challenging and important problem.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner