Ep#12 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling

Jun 10, 2025

Florent Bartoccioni, a researcher at VALEO AI focusing on world models and unannotated data for autonomous driving, joins the discussion on cutting-edge technologies. He details the limitations of traditional human-annotated systems and advocates for self-supervised learning. The conversation dives into the significance of diverse datasets and the power of video generative modeling, including advances in spatio-temporal embeddings and denoising approaches. Bartoccioni also sheds light on crash scenario simulations and how they can refine training for safer autonomous vehicles.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Limits of Expert-Defined Systems

Expert-defined representations in autonomous driving are rigid and incomplete.
They fail to detect or handle undefined or rare road events effectively.

INSIGHT

Generative Models Learn Driving World

Video generative models can implicitly learn world dynamics for driving.
They predict future frames and capture geometry and semantics without human annotation.

ADVICE

Training Strategy With Mixed Data

First pre-train on large uncalibrated data for general features.
Then fine-tune on smaller, calibrated datasets closer to the target domain.

Get the Snipd Podcast app to discover more snips from this episode

Get the app