Get the app
Brandon Cui
Research Scientist at MosaicML and Databricks, leading RLHF post-training efforts. Expert in AI model optimization, reward models, and RLHF.
Best podcasts with Brandon Cui
Ranked by the Snipd community
12 snips
Mar 20, 2025
• 40min
Reward Models | Data Brew | Episode 40
chevron_right
Brandon Cui, a Research Scientist at MosaicML and Databricks, specializes in AI model optimization and leads RLHF efforts. In this discussion, he unveils how synthetic data and RLHF can fine-tune models for better outcomes. He explores techniques like Policy Proximal Optimization and Direct Preference Optimization that enhance model responses. Brandon also emphasizes the critical role of reward models in boosting performance in coding, math, and reasoning tasks, while highlighting the necessity of human oversight in AI training.
The AI-powered Podcast Player
Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
Get the app