EP 325 Joe Edelman on Full-Stack AI Alignment

70 snips

Oct 7, 2025

Joe Edelman, a researcher focused on AI alignment, shares his insights on designing social systems that promote human flourishing. He discusses the importance of pluralism, critiquing conventional voting and market models for their superficiality. Edelman emphasizes the concept of 'thick models of value,' arguing that true values encompass deeper reasons and norms. He also addresses the risks of AI assistants potentially manipulating users and proposes innovative solutions like value-aware markets to navigate societal challenges. A thought-provoking conversation on the future of AI and governance!

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Preferences Are Insufficient Signals

Preferencist models (clicks, votes) give shallow signals about people and miss why choices are made.
Joe Edelman argues alignment needs richer information like values and norms to guide AI and institutions.

INSIGHT

Language Alone Can Be Too Vague

Text-based specifications are expressive but often vague and underspecified for high-stakes norms.
Vague single-word goals like "helpful" or "harmless" leave open cultural interpretation and manipulation.

ADVICE

Be Precise When Specifying Model Behavior

Use philosophy and cognitive science to craft prompts that actually specify norms and values.
Measure how much detail is required rather than relying on vague words like "helpful."

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Jim talks with Joe Edelman about the ideas in the Meaning Alignment Institute's recent paper "Full Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value." They discuss pluralism as a core principle in designing social systems, the informational basis for alignment, how preferential models fail to capture what people truly care about, the limitations of markets and voting as preference-based systems, critiques of text-based approaches in LLMs, thick models of value, values as attentional policies, AI assistants as potential vectors for manipulation, the need for reputation systems and factual grounding, the "super negotiator" project for better contract negotiation, multipolar traps, moral graph elicitation, starting with membranes, Moloch-free zones, unintended consequences and lessons from early Internet optimism, concentration of power as a key danger, co-optation risks, and much more. Episode Transcript "A Minimum Viable Metaphysics," by Jim Rutt (Substack) Jim's Substack JRS Currents 080: Joe Edelman and Ellie Hain on Rebuilding Meaning Meaning Alignment Institute If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All, by Eliezer Yudkowsky and Nate Soares "Full Stack Alignment: Co-aligning AI and Institutions with Thick Models of Value," by Joe Edelman et al. "What Are Human Values and How Do We Align AI to Them?" by Oliver Klingefjord, Ryan Lowe, and Joe Edelman Joe Edelman has spent much of his life trying to understand how ML systems and markets could change, retaining their many benefits but avoiding their characteristic problems: of atomization, and of servicing shallow desires over deeper needs. Along the way this led him to formulate theories of human meaning and values (https://arxiv.org/abs/2404.10636) and study models of societal transformation (https://www.full-stack-alignment.ai/paper) as well as inventing the meaning-based metrics used at CouchSurfing, Facebook, and Apple, co-founding the Center for Humane Technology and the Meaning Alignment Institute, and inventing new democratic systems (https://arxiv.org/abs/2404.10636). He’s currently one of the PIs leading the Full-Stack Alignment program at the Meaning Alignment Institute, with a network of more than 50 researchers at universities and corporate labs working on these issues.