How a Moonshot Led to Google DeepMind's Veo 3

36 snips

Oct 16, 2025

Dumi Erhan, co-lead of the Veo project at Google DeepMind, shares his extensive expertise in video-generation research. He delves into the fascinating journey of the Veo project, from its moonshot beginnings to the groundbreaking Veo 3 model with audio capabilities. Dumi discusses the challenges of long-duration video coherence and the impact of user feedback on future developments. He also explores the complexity of image-to-video generation and highlights innovative prompting methods that enhance user control.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Origins In A 2018 Moonshot

The Veo project began as a Google Brain moonshot in 2018 aimed at pushing video generation boundaries.
Early work focused on video prediction and robotics use-cases, which shaped long-term research directions.

INSIGHT

Evaluation And Inductive-Bias Gaps

Despite huge quality gains since 2018, core problems like evaluation and correct inductive biases remain unsolved.
Video lacks the clear tokenization of text, making progress and measurement harder than with LLMs.

ADVICE

Combine Metrics With Careful Human Eval

Use automated metrics to rule out failing models, but rely on human evaluation for preferences and user-facing quality.
Avoid optimizing solely for superficial human-preference signals like contrast that don't improve real capability.

Get the Snipd Podcast app to discover more snips from this episode

Get the app