LessWrong (Curated & Popular)

“Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI” by Kaj_Sotala

26 snips
Apr 17, 2025
Kaj Sotala, an AI researcher and writer, dives into the surprising reasoning failures of large language models (LLMs). He highlights issues like flawed logic in problem-solving, struggles with simple instruction, and inconsistent storytelling, particularly in character portrayal. Kaj argues that despite advancements, LLMs still lack the necessary capabilities for achieving true artificial general intelligence. He emphasizes the need for qualitative breakthroughs, rather than just iterative improvements, to address these profound challenges in AI development.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

LLMs Lack True Generalization

  • Current LLMs fail surprising reasoning tasks despite other capabilities, suggesting they lack true generalization.
  • Their training doesn't generalize well to simple novel reasoning, implying limits toward achieving AGI.
ANECDOTE

LLMs Struggle with Sliding Puzzles

  • Claude and other LLMs repeatedly failed a sliding puzzle, making illegal or nonsensical moves.
  • Even after multiple self-checks, these models could not find valid or optimized solutions.
ANECDOTE

Inconsistent Following of Instructions

  • Claude struggled to consistently follow simple coaching instructions like asking one question at a time.
  • GPT versions up to 4.0 failed, while GPT 4.5 might have improved but needs more testing.
Get the Snipd Podcast app to discover more snips from this episode
Get the app