Deep Papers

Georgia Tech's Santosh Vempala Explains Why Language Models Hallucinate, His Research With OpenAI

Oct 14, 2025
Santosh Vempala, a distinguished professor at Georgia Tech, dives deep into the complexities of language models and their notorious ability to hallucinate. He explains how maximum likelihood pre-training can lead to these issues and the crucial trade-offs between memorization and generalization. Through fascinating examples, he discusses how calibration impacts accuracy and presents a formal theorem linking hallucinations to misclassification. Vempala also highlights practical approaches to detect invalid model outputs and shares insights into improving AI evaluation methods.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Concrete Definition Of Hallucination

  • Hallucinations are defined as plausible but invalid outputs drawn from a set of possible generations.
  • Valid, invalid, and plausible-but-invalid subsets clarify what we mean by hallucination.
INSIGHT

Pre-Training Encourages Hallucination

  • Pre-training encourages models to match the training distribution, which can produce hallucinations when asked about unseen facts.
  • Post-training reduces hallucination but can break calibration.
ANECDOTE

Alan Turing Example Shows Calibration Tradeoff

  • Santosh gave a toy dataset where half the training examples repeat Alan Turing's death and the rest are random names and dates.
  • A calibrated model fitting that data will hallucinate 50% of the time on unseen name/date pairs.
Get the Snipd Podcast app to discover more snips from this episode
Get the app