
LessWrong (30+ Karma) “Problems I’ve Tried to Legibilize” by Wei Dai
Nov 10, 2025
Wei Dai, a veteran contributor to the LessWrong community and a thinker in AI safety, dives into the intricacies of making AI risk legible. He discusses key philosophical issues, including decision theory and metaethics, and critiques alignment proposals like utilitarianism. Dai highlights human-AI safety risks due to value conflicts and draws attention to the challenges of making complex problems accessible to decision makers. He remains hopeful that future AI advisors can enhance our understanding of these vital issues in AI governance.
AI Snips
Chapters
Transcript
Episode notes
Legibilizing Hard Problems
- Wei Dai frames much of his work as making hard problems legible to himself and others.
- Legibilizing clarifies what needs attention in AI risk and guides future work.
Philosophical Foundations Matter
- Wei Dai lists core philosophical problems that need clearer framing, including probability, decision theory, and metaethics.
- These foundational issues shape how we reason about AI risk and possible interventions.
Scrutinize Alignment Proposals
- Dai highlights problems with specific alignment ideas like Solomonoff induction, corrigibility, and provable safety.
- Critically examining these concepts prevents overreliance on seemingly rigorous but possibly flawed solutions.
