“Problems I’ve Tried to Legibilize” by Wei Dai

Nov 17, 2025

Wei Dai delves into the challenge of making complex AI risk concepts more understandable. He explores philosophical topics like decision theory and metaethics that need clarity. Critiques of alignment ideas, such as utilitarianism, are examined. The discussion highlights how human values can clash with AI behaviors, revealing moral fragility. Dai emphasizes the importance of teaching future decision-makers about invisible safety risks, addressing the complications of ensuring AI alignment and safety.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Legibilizing AI Risk Problems

Wei Dai characterizes much of his work as "legibilizing": making complex AI-risk problems clearer to himself and others.
He argues many philosophical and technical problems remain illegible, which hinders effective decision-making on AI development.

INSIGHT

Philosophical Foundations Matter

Wei Dai lists philosophical areas needing clarity, including probability and decision theory, metaethics, and logical uncertainty in bargaining.
These areas are foundational to understanding and resolving deep AI alignment challenges.

INSIGHT

Scrutinize Popular Alignment Ideas

He flags specific alignment ideas that require scrutiny, like utilitarianism, Solomonoff induction, provable safety, and corrigibility.
Many popular proposals have subtle problems that deserve careful legibilization rather than blind adoption.

Get the Snipd Podcast app to discover more snips from this episode

Get the app