Data Brew by Databricks cover image

Data Brew by Databricks

Reward Models | Data Brew | Episode 40

Mar 20, 2025
Brandon Cui, a Research Scientist at MosaicML and Databricks, specializes in AI model optimization and leads RLHF efforts. In this discussion, he unveils how synthetic data and RLHF can fine-tune models for better outcomes. He explores techniques like Policy Proximal Optimization and Direct Preference Optimization that enhance model responses. Brandon also emphasizes the critical role of reward models in boosting performance in coding, math, and reasoning tasks, while highlighting the necessity of human oversight in AI training.
39:58

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Reward models utilize pairwise preferences to efficiently gather user feedback, enabling language model fine-tuning for improved response quality.
  • The exploration of fine-grained reward models allows for targeted evaluations of specific segments in generated responses, enhancing error identification and correction.

Deep dives

Understanding Reward Models

Reward models are essential for scoring the quality of generated content by assessing whether it meets specific criteria, such as helpfulness or safety. These models are trained using pairwise preferences, where two responses to a prompt are evaluated to determine which is superior. This approach allows for feedback to be gathered efficiently, as human evaluators can easily indicate which response is better without the need for in-depth analysis. The insights gained from reward models enable researchers to refine language models to generate responses that align more closely with user needs.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode