

[Linkpost] “Gemini Diffusion: watch this space” by Yair Halberstadt
May 22, 2025
Google DeepMind's Gemini Diffusion is shaking up AI with its innovative iterative denoising method, revolutionizing how output is generated. The speaker highlights its impressive speed, achieving nearly 1000 tokens per second, and recalls a personal encounter where it aced a Google interview question in record time. With potential far beyond current language models, this technology could redefine AI capabilities. While it’s not flawless, its performance shows that the future is brimming with exciting possibilities.
AI Snips
Chapters
Transcript
Episode notes
Diffusion vs Token Prediction
- Diffusion models iteratively denoise all output tokens until coherent results emerge, unlike LLMs predicting one token at a time.
- This approach is inspired by image diffusion models and offers unique generation advantages.
Gemini Diffusion Interview Demo
- Google DeepMind's Gemini Diffusion answered my Google interview question perfectly in 2 seconds.
- It struggled a bit on follow-ups but still outperforms ChatGPT 3 significantly.
Diffusion Enables Native Editing
- Diffusion models allow native editing in the middle of outputs, unlike sequence-based LLMs.
- Since output is generated all at once, internal contradictions common in LLM outputs are minimized.