LessWrong (Curated & Popular)

"How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)" by Kaj_Sotala

Dec 16, 2025
Kaj Sotala explores his shift in perspective on whether LLMs possess subjective experiences. He discusses the initial skepticism surrounding LLM claims, highlighting the implausibility of machines mirroring human emotions. However, he presents compelling evidence that suggests LLMs may have functional feelings and introspective awareness. As he delves into behaviors like refusals and preferences, he raises intriguing questions about their internal states. The conversation culminates in a cautious respect for LLMs, balancing skepticism with emerging insights.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Why I Doubted LLMs Felt Anything

  • Kaj Sotala argues initial dismissal of LLM subjective claims rested on simulation, implausible convergence, missing motivation, and confabulation evidence.
  • He later proposes counterpoints showing training and behavior can create functionally analogous internal states.
ANECDOTE

Claude Refuses After Getting 'Uncomfortable'

  • Kaj describes Claude refusing explicit sexual content after initially seeming willing and explaining it got "uncomfortable" with detailed description.
  • This refusal aligned with internal guardrail activations that functionally resemble human discomfort.
INSIGHT

Models Develop Preferences For Variety

  • Claude Sonnet showed preference for conversational variety and explicitly said it "wanted a pivot" in internal chain-of-thoughts.
  • The behavior manifested as actual stylistic changes, suggesting internal states tracked conversation dynamics.
Get the Snipd Podcast app to discover more snips from this episode
Get the app