
Infinite Curiosity Pod with Prateek Joshi Diffusion LLMs - The Fastest LLMs Ever Built | Stefano Ermon, cofounder of Inception Labs
10 snips
Oct 9, 2025 In this engaging discussion, Stefano Ermon, a Stanford associate professor and co-founder of Inception Labs, dives into the revolutionary world of Diffusion Language Models. He explains how these models surpass traditional autoregressive techniques, highlighting breakthroughs in parallel refinement for text and code generation. Stefano also shares insights on engineering challenges, the importance of high-quality data, and commercial viability. Excitingly, he discusses the future potential of diffusion LLMs in coding and multimodal applications.
AI Snips
Chapters
Books
Transcript
Episode notes
Parallel Refinement Replaces Left-To-Right Generation
- Diffusion LLMs generate text by starting with a rough whole-sequence guess and iteratively refining it in parallel.
- This contrasts with autoregressive LLMs that generate tokens left-to-right, one step at a time.
Parallelism Unlocks Better Speed–Cost Tradeoffs
- Diffusion LLMs shift the fundamental trade-offs between quality, cost, and latency by enabling high parallelism.
- They leverage GPUs' parallel matrix math to be much faster and more cost-efficient than sequential autoregressive models.
From GANs To Text Diffusion Breakthrough
- Stefano described his research path from GANs to score-based diffusion models and the community shift to diffusion for images.
- He recounted a 2024 breakthrough extending diffusion math to discrete text and code, leading to an ICML best paper.




