RLHF Roundup: Trying to get good at PPO, charting RLHF's impact, RewardBench retrospective, and a reward model competition

Jun 26, 2024

Exploring the impact of RLHF in training language models, a retrospective on RewardBench's performance, and the competition for reward modeling are discussed in this insightful podcast. The podcast also delves into the challenges and progress in reinforcement learning through human feedback, comparing DPO and PPO models, and a competition predicting user preferences among large language models.

Ask episode

Chapters

Transcript

Episode notes

Exploring Progress and Challenges in Reinforcement Learning from Human Feedback

00:00 • 9min

Human Preference Prediction Competition for Large Language Models

09:27 • 2min