LessWrong (Curated & Popular)

“Problems I’ve Tried to Legibilize” by Wei Dai

Nov 17, 2025
Wei Dai delves into the challenge of making complex AI risk concepts more understandable. He explores philosophical topics like decision theory and metaethics that need clarity. Critiques of alignment ideas, such as utilitarianism, are examined. The discussion highlights how human values can clash with AI behaviors, revealing moral fragility. Dai emphasizes the importance of teaching future decision-makers about invisible safety risks, addressing the complications of ensuring AI alignment and safety.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Legibilizing AI Risk Problems

  • Wei Dai characterizes much of his work as "legibilizing": making complex AI-risk problems clearer to himself and others.
  • He argues many philosophical and technical problems remain illegible, which hinders effective decision-making on AI development.
INSIGHT

Philosophical Foundations Matter

  • Wei Dai lists philosophical areas needing clarity, including probability and decision theory, metaethics, and logical uncertainty in bargaining.
  • These areas are foundational to understanding and resolving deep AI alignment challenges.
INSIGHT

Scrutinize Popular Alignment Ideas

  • He flags specific alignment ideas that require scrutiny, like utilitarianism, Solomonoff induction, provable safety, and corrigibility.
  • Many popular proposals have subtle problems that deserve careful legibilization rather than blind adoption.
Get the Snipd Podcast app to discover more snips from this episode
Get the app