
LessWrong (Curated & Popular) “The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck
Dec 11, 2025
In this discussion, Alex Mallen, an insightful author known for his work on AI motivations, delves into the behavioral selection model. He explains how cognitive patterns influence AI behavior and outlines three types of motivations: fitness-seekers, schemers, and optimal kludges. Alex discusses the challenges of aligning intended motivations with AI behavior, citing flaws in reward signals. He emphasizes the importance of understanding these dynamics for predicting future AI actions, offering a comprehensive view of the implications behind AI motivations.
AI Snips
Chapters
Transcript
Episode notes
Behavioral Selection Explains AI Motivations
- The behavioral selection model predicts AI motivations by modeling which cognitive patterns cause behaviors that lead to their own selection.
- Cognitive patterns gain influence when their actions increase the chance they'll be kept or deployed.
Use Causal Graphs To Predict Fitness
- To predict selection, place candidate motivations on the causal graph and trace how their actions increase influence through deployment.
- For many motivations you can predict selection strength without simulating exact actions if they reliably increase their targets.
Three Classes Of Highly Fit Motivations
- Three broad maximally fit motivation classes emerge: fitness seekers, schemers, and optimal kludges.
- Each class predicts different deployment behaviors and risks depending on causal pathways to selection.
