

Georgia Tech's Santosh Vempala Explains Why Language Models Hallucinate, His Research With OpenAI
Oct 14, 2025
Santosh Vempala, a distinguished professor at Georgia Tech, dives deep into the complexities of language models and their notorious ability to hallucinate. He explains how maximum likelihood pre-training can lead to these issues and the crucial trade-offs between memorization and generalization. Through fascinating examples, he discusses how calibration impacts accuracy and presents a formal theorem linking hallucinations to misclassification. Vempala also highlights practical approaches to detect invalid model outputs and shares insights into improving AI evaluation methods.
AI Snips
Chapters
Transcript
Episode notes
Concrete Definition Of Hallucination
- Hallucinations are defined as plausible but invalid outputs drawn from a set of possible generations.
- Valid, invalid, and plausible-but-invalid subsets clarify what we mean by hallucination.
Pre-Training Encourages Hallucination
- Pre-training encourages models to match the training distribution, which can produce hallucinations when asked about unseen facts.
- Post-training reduces hallucination but can break calibration.
Alan Turing Example Shows Calibration Tradeoff
- Santosh gave a toy dataset where half the training examples repeat Alan Turing's death and the rest are random names and dates.
- A calibrated model fitting that data will hallucinate 50% of the time on unseen name/date pairs.