
LessWrong (30+ Karma) “The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck
Dec 4, 2025
In this enlightening discussion, Alex Mallen, a researcher on AI alignment and safety, introduces the behavioral selection model for predicting AI motivations. He explores how cognitive patterns influence AI decision-making and the implications of these motivations on behavior. Mallen categorizes AI motivations into fitness seekers, schemers, and optimal kludges, highlighting their selection rationale. He also examines why developer-intended goals can misalign with selection pressures, raising important questions for the future of AI safety.
AI Snips
Chapters
Transcript
Episode notes
Behavior Drives Cognitive Selection
- The behavioral selection model predicts which cognitive patterns an AI keeps by how much their caused behaviors increase selection influence.
- Cognitive patterns gain influence when their behaviors causally raise the chance those patterns persist into deployment.
Map Actions To Selection Pathways
- To evaluate a motivation's fitness, map the actions it chooses onto the causal pathways that increase influence through deployment.
- Seeking correlates of selection is selected for because interventions on shared causes typically boost influence.
Three Classes Of Fit Motivations
- Three motivation classes tend to be maximally fit under the model: fitness-seekers, schemers, and optimal kludges.
- Each class predicts different deployment behaviors and different risks.
