Infinite Curiosity Pod with Prateek Joshi

Diffusion LLMs - The Fastest LLMs Ever Built | Stefano Ermon, cofounder of Inception Labs

10 snips

Oct 9, 2025

In this engaging discussion, Stefano Ermon, a Stanford associate professor and co-founder of Inception Labs, dives into the revolutionary world of Diffusion Language Models. He explains how these models surpass traditional autoregressive techniques, highlighting breakthroughs in parallel refinement for text and code generation. Stefano also shares insights on engineering challenges, the importance of high-quality data, and commercial viability. Excitingly, he discusses the future potential of diffusion LLMs in coding and multimodal applications.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Parallel Refinement Replaces Left-To-Right Generation

Diffusion LLMs generate text by starting with a rough whole-sequence guess and iteratively refining it in parallel.
This contrasts with autoregressive LLMs that generate tokens left-to-right, one step at a time.

INSIGHT

Parallelism Unlocks Better Speed–Cost Tradeoffs

Diffusion LLMs shift the fundamental trade-offs between quality, cost, and latency by enabling high parallelism.
They leverage GPUs' parallel matrix math to be much faster and more cost-efficient than sequential autoregressive models.

ANECDOTE

From GANs To Text Diffusion Breakthrough

Stefano described his research path from GANs to score-based diffusion models and the community shift to diffusion for images.
He recounted a 2024 breakthrough extending diffusion math to discrete text and code, leading to an ICML best paper.

Get the Snipd Podcast app to discover more snips from this episode

Get the app