AI Safety Fundamentals: Alignment cover image

Learning From Human Preferences

AI Safety Fundamentals: Alignment

00:00

Learning From Human Preferences

OpenAI has developed an algorithm that can infer what humans want by being told which of two proposed behaviours is better. The AI gradually builds a model of the task by finding the reward function that best explains the human's judgments. It then uses RL to learn how to achieve that goal. As its behavior improves, it continues to ask for human feedback on trajectory pairs where it's most uncertain about which is better. Our agents can learn from human feedback to achieve strong and sometimes superhuman performance in many environments we tested.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app