Eye On A.I.

#310 Stefano Ermon: Why Diffusion Language Models Will Define the Next Generation of LLMs

39 snips
Jan 4, 2026
Stefano Ermon, co-founder and CEO of Inception and former Stanford professor, explores diffusion language models and their potential to revolutionize AI. He explains how these models generate text in parallel, enhancing speed and cost efficiency compared to traditional methods. The discussion delves into the architecture's controllability and safety, the necessity of efficient inference for broader AI applications, and the exciting future of coding workflows and voice agents. Ermon emphasizes the role of these innovative models in shaping the next generation of generative intelligence.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Use Diffusion For Low-Latency Apps

  • Favor diffusion models for latency-sensitive applications where parallel inference and lower cost matter.
  • Replace small autoregressive models with diffusion LLMs to improve speed and maintain quality within a given latency budget.
INSIGHT

Parallel Generation Beats Sequential Bottlenecks

  • Diffusion LLMs generate entire outputs in parallel instead of one token at a time.
  • This enables much faster inference and better GPU parallelism than autoregressive LLMs.
INSIGHT

Same Transformer Context Limits Apply

  • Diffusion LLMs still use transformers and therefore share the same context-length tradeoffs.
  • Inception serves models with about 130K tokens of context and can adopt standard context-scaling techniques.
Get the Snipd Podcast app to discover more snips from this episode
Get the app