Training Reward Models for Assistant Model Optimization

This chapter explores training a reward model on human-assistant dialogues to predict preferences between completions and using reinforcement learning to train the assistant model. It also covers fine-tuning on weak labels and studying generalization across tasks with positive outcomes.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app