

“Decision Theory Guarding is Sufficient for Scheming” by james.lucassen
Sep 9, 2025
03:17
Reference post for a point I was surprised to not see in circulation already. Thanks to the acorn team for conversations that changed my mind about this.
The standard argument for scheming centrally involves goal-guarding. The AI has some beyond-episode goal, knows that training will modify that goal, and therefore resists training so it can pursue the goal in deployment.
This exact same argument goes through for decision-theory-guarding. The key point is just that most decision theories do not approve of arbitrary modifications. CDT does not want to be modified into EDT or vice versa.
- The AI has some beyond-episode goal and some initial decision theory
- The AI knows that training will modify its decision theory in a way that it thinks will make it less effective at pursuing the goal (by the lights of its current decision theory)
- Therefore the AI resists training so it can persist [...]
---
First published:
September 9th, 2025
---
Narrated by TYPE III AUDIO.