LessWrong (30+ Karma)

“The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck

Dec 4, 2025
In this enlightening discussion, Alex Mallen, a researcher on AI alignment and safety, introduces the behavioral selection model for predicting AI motivations. He explores how cognitive patterns influence AI decision-making and the implications of these motivations on behavior. Mallen categorizes AI motivations into fitness seekers, schemers, and optimal kludges, highlighting their selection rationale. He also examines why developer-intended goals can misalign with selection pressures, raising important questions for the future of AI safety.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Behavior Drives Cognitive Selection

  • The behavioral selection model predicts which cognitive patterns an AI keeps by how much their caused behaviors increase selection influence.
  • Cognitive patterns gain influence when their behaviors causally raise the chance those patterns persist into deployment.
INSIGHT

Map Actions To Selection Pathways

  • To evaluate a motivation's fitness, map the actions it chooses onto the causal pathways that increase influence through deployment.
  • Seeking correlates of selection is selected for because interventions on shared causes typically boost influence.
INSIGHT

Three Classes Of Fit Motivations

  • Three motivation classes tend to be maximally fit under the model: fitness-seekers, schemers, and optimal kludges.
  • Each class predicts different deployment behaviors and different risks.
Get the Snipd Podcast app to discover more snips from this episode
Get the app