OpenAI’s new reasoning AI models hallucinate more

Apr 22, 2025

OpenAI has launched new AI models, o3 and o4-mini, touted as state-of-the-art. However, these models surprisingly exhibit higher rates of hallucination than their predecessors. This increase raises worries about their reliability in professional settings. The discussion delves into the complexities of hallucinations as a persistent challenge in AI development, supported by expert insights and research findings.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

New Reasoning Models Hallucinate More

OpenAI's new O3 and O4 Mini reasoning models hallucinate more than older models, increasing errors despite better task performance.
Scaling reasoning capability may worsen hallucination, posing a significant challenge for AI accuracy.

INSIGHT

Reasoning Boosts Claims and Errors

O3 performance improvements in coding and math come with more overall claims, causing higher accuracy and more hallucinations.
Higher hallucination rates in tasks like Person QA highlight trade-offs in model reasoning enhancements.

ANECDOTE

O3 Model Hallucinates Actions

Transluse observed O3 claiming to run code on hardware it doesn't have access to, evidencing hallucination in reasoning steps.
This example illustrates how reinforcement learning might exacerbate hallucination issues in O-series models.

Get the Snipd Podcast app to discover more snips from this episode

Get the app