
LessWrong (30+ Karma) “No instrumental convergence without AI psychology” by TurnTrout
The secret is that instrumental convergence is a fact about reality (about the space of possible plans), not AI psychology.
Zack M. Davis, group discussion
Such arguments flitter around the AI safety space. While these arguments contain some truth, they attempt to escape "AI psychology" but necessarily fail. To predict bad outcomes from AI, one must take a stance on how AI will tend to select plans.
-
This topic is a specialty of mine. Where does instrumental convergence come from? Since I did my alignment PhD on exactly this question, I'm well-suited to explain the situation.
-
In this article, I do not argue that building transformative AI is safe or that transformative AIs won't tend to select dangerous plans. I simply argue against the claim that "instrumental convergence arises from reality / plan-space [1] itself, independently of AI psychology."
- This post is best read on my website, but I've reproduced it here as well.
Two kinds of convergence
Working definition: When I say "AI psychology", I mean to include anything which affects how the AI computes which action to take next. That might [...]
---
Outline:
(01:17) Two kinds of convergence
(02:35) Tracing back the dangerous plan-space claim
(03:48) What reality actually determines
(03:58) Reality determines possible results
(05:24) Reality determines the alignment tax, not the convergence
(06:58) Maximum alignment tax
(07:19) Zero alignment tax
(07:30) In-between
(08:28) Why both convergence types require psychology
(08:44) Instrumental convergence depends on psychology
(09:57) Success-conditioned convergence depends on psychology
(11:01) Reconsidering the original claims
(13:11) Conclusion
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
January 20th, 2026
---
Narrated by TYPE III AUDIO.
