Learning From Human Preferences

OpenAI has developed an algorithm that can infer what humans want by being told which of two proposed behaviours is better. The AI gradually builds a model of the task by finding the reward function that best explains the human's judgments. It then uses RL to learn how to achieve that goal. As its behavior improves, it continues to ask for human feedback on trajectory pairs where it's most uncertain about which is better. Our agents can learn from human feedback to achieve strong and sometimes superhuman performance in many environments we tested.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app