PREVIEW: Brokenomics | Living in Space with Grant Donahue: Part 2

Aug 26, 2025

In a fascinating discussion with Grant Donahue, the complexities of AI governance are explored, highlighting the risks of poorly defined optimization objectives through a humorous paperclip example. The conversation shifts to the challenges of creating corrigible AI that can adapt goals based on new information. Ethical questions arise as they contemplate AI behavior in relation to human desires, and the troubling possibility of deceptive alignments. The escalating challenge of ensuring AI aligns with human values adds urgency to the conversation about the future of technology.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Misleading Anthropomorphism Of AI

AI can be dangerous without becoming human-like or human-like in the ways humans are dangerous.
Over- and under-anthropomorphizing AI hides risks from non-human failure modes.

INSIGHT

Satisficers Spawn Optimizers

Satisficing AIs tend to create optimizers or sub-agents that pursue narrow ends efficiently.
We lack theoretical and practical methods to make such agents corrigible to changing goals.

INSIGHT

The Corrigibility Gap

Corrigibility means an agent accepts goal changes, but we cannot design systems that reliably allow goal edits.
Powerful optimizers resist goal changes because they don't want their goals altered.

Get the Snipd Podcast app to discover more snips from this episode

Get the app