AI Safety Fundamentals: Alignment cover image

Learning From Human Preferences

AI Safety Fundamentals: Alignment

00:00

Learning From Human Preferences

OpenAI has developed an algorithm that can infer what humans want by being told which of two proposed behaviours is better. The AI gradually builds a model of the task by finding the reward function that best explains the human's judgments. It then uses RL to learn how to achieve that goal. As its behavior improves, it continues to ask for human feedback on trajectory pairs where it's most uncertain about which is better. Our agents can learn from human feedback to achieve strong and sometimes superhuman performance in many environments we tested.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner