
Google AI: Release Notes How a Moonshot Led to Google DeepMind's Veo 3
36 snips
Oct 16, 2025 Dumi Erhan, co-lead of the Veo project at Google DeepMind, shares his extensive expertise in video-generation research. He delves into the fascinating journey of the Veo project, from its moonshot beginnings to the groundbreaking Veo 3 model with audio capabilities. Dumi discusses the challenges of long-duration video coherence and the impact of user feedback on future developments. He also explores the complexity of image-to-video generation and highlights innovative prompting methods that enhance user control.
AI Snips
Chapters
Transcript
Episode notes
Origins In A 2018 Moonshot
- The Veo project began as a Google Brain moonshot in 2018 aimed at pushing video generation boundaries.
- Early work focused on video prediction and robotics use-cases, which shaped long-term research directions.
Evaluation And Inductive-Bias Gaps
- Despite huge quality gains since 2018, core problems like evaluation and correct inductive biases remain unsolved.
- Video lacks the clear tokenization of text, making progress and measurement harder than with LLMs.
Combine Metrics With Careful Human Eval
- Use automated metrics to rule out failing models, but rely on human evaluation for preferences and user-facing quality.
- Avoid optimizing solely for superficial human-preference signals like contrast that don't improve real capability.
