TalkRL: The Reinforcement Learning Podcast cover image

John Schulman

TalkRL: The Reinforcement Learning Podcast

00:00

Using Human Feedback to Define a Reward Function

I realized the next frontier was figuring out how to make language models actually useful. I'm still really interested in RL, but solving RL benchmarks isn't the end of the story. To use your RL algorithm, you need a reward function. How exactly do you define this reward becomes a challenging and important problem.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app