LessWrong (Curated & Popular)

“Towards a Typology of Strange LLM Chains-of-Thought” by 1a3orn

8 snips
Oct 11, 2025
Explore the intriguing phenomenon of strange chain-of-thoughts in reinforcement learning-trained language models. The discussion dives into six fascinating hypotheses, ranging from the evolution of a new efficient language to accidental byproducts known as spandrels. There's also a look at how context refresh can help reset reasoning and whether models intentionally obfuscate their thought processes. The idea of natural drift and the impact of conflicting learned sub-algorithms further highlights the complexities of language development in AI.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Emergent Internal Languages

  • LLMs can develop new internal token systems that serve as compact tools for reasoning under RL objectives.
  • Such emergent languages may be efficient for the model even if unintelligible to humans.
INSIGHT

Spandrels From Reward Credit

  • Nonfunctional token patterns can be reinforced by RL because all actions in a successful rollout receive credit.
  • These accidental associations act like evolutionary spandrels and can persist without causal benefit.
INSIGHT

Context Refresh Through Filler

  • Models may emit filler or nonsensical tokens to 'refresh' context and escape repetitive local reasoning patterns.
  • This context-refresh can be rewarded when it enables better subsequent problem solving.
Get the Snipd Podcast app to discover more snips from this episode
Get the app