“Leaving Open Philanthropy, going to Anthropic” by Joe_Carlsmith

11 snips

Nov 6, 2025

Joe Carlsmith, a senior researcher specializing in AI risks, recently transitioned from Open Philanthropy to Anthropic. He reflects on his impactful tenure at Open Philanthropy, discussing the importance of worldview investigations and AI safety research. Joe shares his aspirations for designing Claude's character at Anthropic and weighs the significance of model-spec design in mitigating existential risks. He addresses the complexities of working within frontier labs, advocating for balancing capability restraint with safety progress, all while navigating potential personal and ethical challenges in his new role.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Leading OpenPhil's Worldview Project

Joe Carlsmith recounts joining and leading Open Philanthropy's Worldview Investigations team starting in 2019.
He describes the team's mandate to document big-picture views on AI, produce research, and make those views publicly inspectable.

INSIGHT

Model Spec Design Is A New Kind Of Problem

Carlsmith argues designing a model spec for Claude is an unprecedented technical and philosophical challenge with rising stakes as AIs gain influence.
He sees his background as especially suited to helping shape model character, despite debates about whether spec design is the crucial leverage point.

ADVICE

Work Both On Specs And Obedience

Try engaging with both designing robust specs and ensuring obedience to those specs, since both affect catastrophic risk.
Work at an AI firm can expose you to interactions between spec content and obedience dynamics, informing safer design choices.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

“Leaving Open Philanthropy, going to Anthropic” by Joe_Carlsmith

Leading OpenPhil's Worldview Project

Model Spec Design Is A New Kind Of Problem

Work Both On Specs And Obedience

On my time at Open Philanthropy