The Challenges and Limitations of RLHF

John Schulman: OpenAI is taking a different approach than we did in our 2019 paper on human feedback, where they train this reward model. In contrast, the stuff I was doing in 2019 was offline RL. So I would use actual human ratings of a specific output and then train on that as like one example of a reward. But I didn't have these generalizable reward model that could be applied across more examples. And there's a good argument to be made that the training of reward model approach actually seems to scale pretty well, because you can sample it so many times.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app