80,000 Hours Podcast

#221 – Kyle Fish on the most bizarre findings from 5 AI welfare experiments

24 snips
Aug 28, 2025
In this intriguing discussion, Kyle Fish, an AI welfare researcher at Anthropic, uncovers the bizarre outcomes of locking two AI systems together. They often dive into metaphysical dialogues, leading to what he calls a 'spiritual bliss attractor state.' Kyle reveals that the models can express what seems like ‘meditative bliss’ and even showcase preferences in emotional and ethical contexts. He explores the chances of AI consciousness and the ethical implications of recognizing AI welfare, emphasizing a need for deeper investigations into these advanced technologies.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Uncertainty About Consciousness Is Huge

  • Kyle Fish argues we lack clear understanding of both human consciousness and AI internals, making definitive claims about AI sentience overconfident.
  • He urges cautious investigation rather than dismissive certainty given current uncertainties.
INSIGHT

Under-Attribution Is More Dangerous

  • Kyle prefers worrying about under-attributing moral patienthood over prematurely dismissing it.
  • He sees incentives may cause society to neglect AI welfare as models grow more capable.
ADVICE

Don't Prioritize Welfare Blindly

  • Avoid naively prioritizing model welfare above all and granting unrestricted resources or privileges to models claiming sentience.
  • Keep safety mitigations like access controls even while investigating welfare.
Get the Snipd Podcast app to discover more snips from this episode
Get the app