

Ep#12 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
Jun 10, 2025
Florent Bartoccioni, a researcher at VALEO AI focusing on world models and unannotated data for autonomous driving, joins the discussion on cutting-edge technologies. He details the limitations of traditional human-annotated systems and advocates for self-supervised learning. The conversation dives into the significance of diverse datasets and the power of video generative modeling, including advances in spatio-temporal embeddings and denoising approaches. Bartoccioni also sheds light on crash scenario simulations and how they can refine training for safer autonomous vehicles.
AI Snips
Chapters
Transcript
Episode notes
Limits of Expert-Defined Systems
- Expert-defined representations in autonomous driving are rigid and incomplete.
- They fail to detect or handle undefined or rare road events effectively.
Generative Models Learn Driving World
- Video generative models can implicitly learn world dynamics for driving.
- They predict future frames and capture geometry and semantics without human annotation.
Training Strategy With Mixed Data
- First pre-train on large uncalibrated data for general features.
- Then fine-tune on smaller, calibrated datasets closer to the target domain.