How can AIs know what we want if we don't even know? (with Geoffrey Irving)

9 snips

Jan 24, 2024

Geoffrey Irving, an AI safety researcher at DeepMind with a rich background at OpenAI and Google Brain, delves into the intricate challenge of aligning AI systems with human values. He discusses how AIs can misinterpret user intentions and the philosophical differences between being an assistant and an autonomous agent. Irving also examines the biases in AI training, particularly from WEIRD cultures, and the potential for AI to manipulate human emotions. He emphasizes the need for diverse cultural representation and ethical guidelines to ensure responsible AI development.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

AI Alignment

Aligning AI systems means making them do what humans want.
This seemingly simple goal is complex due to the ambiguity of "what we want".

ADVICE

Focus on Assistance

Focus on designing AI systems for assistive tasks, rather than fully autonomous actions.
This simplifies defining and evaluating "good" behavior.

INSIGHT

Bridging Informational and Autonomous AI

One can bridge the gap between informational and autonomous AI by simulating human approval.
AI could list actions, predict human endorsement, and act if confident.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

How can AIs know what we want if *we* don't even know? (with Geoffrey Irving)

AI Alignment

Focus on Assistance

Bridging Informational and Autonomous AI

How can AIs know what we want if we don't even know? (with Geoffrey Irving)