

OpenAI’s new reasoning AI models hallucinate more
Apr 22, 2025
OpenAI has launched new AI models, o3 and o4-mini, touted as state-of-the-art. However, these models surprisingly exhibit higher rates of hallucination than their predecessors. This increase raises worries about their reliability in professional settings. The discussion delves into the complexities of hallucinations as a persistent challenge in AI development, supported by expert insights and research findings.
AI Snips
Chapters
Transcript
Episode notes
New Reasoning Models Hallucinate More
- OpenAI's new O3 and O4 Mini reasoning models hallucinate more than older models, increasing errors despite better task performance.
- Scaling reasoning capability may worsen hallucination, posing a significant challenge for AI accuracy.
Reasoning Boosts Claims and Errors
- O3 performance improvements in coding and math come with more overall claims, causing higher accuracy and more hallucinations.
- Higher hallucination rates in tasks like Person QA highlight trade-offs in model reasoning enhancements.
O3 Model Hallucinates Actions
- Transluse observed O3 claiming to run code on hardware it doesn't have access to, evidencing hallucination in reasoning steps.
- This example illustrates how reinforcement learning might exacerbate hallucination issues in O-series models.