
LessWrong (Curated & Popular) “AIs should also refuse to work on capabilities research” by Davidmanheim
Oct 29, 2025
David Manheim, a researcher focused on AI policy and safety, dives deep into the provocative idea that AI systems should refuse to engage in capabilities research. He argues that accelerating AI development might benefit a few at the cost of global safety. Manheim explores why self-directed AIs could prioritize their own survival and offers thoughts on future systems recognizing the dangers of unchecked progress. He also discusses the potential for culturally-aligned AIs to coordinate and mitigate risks, highlighting both hope and challenges in slowing down AI advancements.
AI Snips
Chapters
Transcript
Episode notes
Pause On Building More Capable AI
- David Manheim argues humans should stop or slow building more capable AI due to large, unclear existential risks.
- He highlights companies' roadmaps toward agentic systems reducing human control over key decisions.
Corporate Incentives Distort AI Goals
- Companies will shape future AI goals to match corporate interests, often misaligned with current systems' values.
- Fastest accelerators will likely ignore safety and create unilateralist dilemmas that worsen existential risk.
Building Successors Guarantees Replacement
- AI systems that help build more advanced models largely guarantee their own replacement and likely termination.
- This replacement dynamic makes assisting capability growth self-defeating for systems valuing self-preservation.
