Training AI with Human Feedback

This chapter explores the process of training a machine learning algorithm using human feedback. They discuss the challenges of feedback in AI alignment research and the potential failures in the training system. They also describe a three-part training process involving humans ranking video clips, a reward predictor learning how humans would rank different behaviors, and a reinforcement learning algorithm maximizing the predicted reward.

Play episode from 05:36

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app