Are Reasoning LLMs Changing The Game? (Ep. 506)

The Daily AI Show

00:00

Evaluating AI Reasoning: LLMs in Action

This chapter explores the effectiveness of different large language models (LLMs) in tasks like invoice comparison, highlighting discrepancies in performance between model versions. The discussion also raises questions about the nature of AI reasoning and how it compares to human cognition, emphasizing the challenges of benchmarking AI capabilities.

Play episode from 16:59

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Want to keep the conversation going?

Join our Slack community at thedailyaishowcommunity.com

the team explores whether today’s AI models are just simulating thought or actually beginning to “think.” They break down advances in reasoning models, reinforcement learning, and world modeling, debating if AI’s step-by-step problem-solving can fairly be called thinking. The discussion dives into philosophy, practical use cases, and why the definition of “thinking” itself might need rethinking.

Key Points Discussed

Early chain-of-thought prompting looked like reasoning but was just simulated checklists, exposing AI’s explainability problem.

Modern LLMs now demonstrate intrinsic deliberation, spending compute to weigh alternatives before responding.

Reinforcement learning trains models to value structured thinking, not just the right answer, helping them plan steps and self-correct.

Deduction, induction, abduction, and analogical reasoning methods are now modeled explicitly in advanced systems.

The group debates whether this step-by-step reasoning counts as “thinking” or is merely sophisticated processing.

Beth notes that models lack personal perspective or sensory grounding, limiting comparisons to human thought.

Karl stresses client perception—many non-technical users interpret these models’ behavior as thinking.

Brian draws a line at novel output—until models produce ideas outside their training data, it remains prediction.

Andy argues that if we call human reasoning “thinking,” then machine reasoning using similar steps deserves the label too.

Symbolic reasoning, code execution, and causality representation are key to closing the reasoning gap.

Memory, world models, and external tool access push models toward human-like problem solving.

Yann LeCun’s view that embodied AI will be required for human-level reasoning features heavily in the discussion.

The debate surfaces differing views: practical usefulness vs. philosophical accuracy in labeling AI behavior.

Conclusion: AI as a “process engine” may satisfy both camps, but the line between reasoning and thinking is getting blurry.

Timestamps & Topics

00:00:00 🧠 Reasoning models vs. chain-of-thought prompts

00:02:05 💡 Native deliberation as a breakthrough

00:03:15 🏛️ Thinking Fast and Slow analogy

00:05:14 🔍 Deduction, induction, abduction, analogy

00:07:03 🤔 Does problem-solving = thinking?

00:09:00 📜 Legal hallucination as reasoning failure

00:12:41 ⚙️ Symbolic logic and code interpreter role

00:16:36 🛠️ Deterministic vs. generative outcomes

00:20:05 📊 Real-world use case: invoice validation

00:23:06 💬 Why non-experts believe AI “thinks”

00:26:08 🛤️ Reasoning as multi-step prediction

00:29:47 🎲 AlphaGo’s strange but optimal moves

00:32:14 🧮 Longer processing vs. actual thought

00:35:10 🌐 World models and sensory grounding gap

00:38:57 🎨 Human taste and preference vs. AI outputs

00:41:47 🧬 Creativity as human advantage—for now

00:44:30 📈 Karl’s business growth powered by O3 reasoning

00:47:01 ⚡ Future: lightning-speed multi-agent parallelism

00:51:15 🧠 Memory + prediction defines thinking engines

00:53:16 📅 Upcoming shows preview and community CTA

#ThinkingMachines #LLMReasoning #ChainOfThought #ReinforcementLearning #WorldModeling #SymbolicAI #AIphilosophy #AIDebate #AgenticAI #DailyAIShow

The Daily AI Show Co-Hosts:

Andy Halliday, Beth Lyons, Brian Maucere, Jyunmi Hatcher, and Karl Yeh

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books