RoboPapers

Ep#12 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling

Jun 10, 2025
Florent Bartoccioni, a researcher at VALEO AI focusing on world models and unannotated data for autonomous driving, joins the discussion on cutting-edge technologies. He details the limitations of traditional human-annotated systems and advocates for self-supervised learning. The conversation dives into the significance of diverse datasets and the power of video generative modeling, including advances in spatio-temporal embeddings and denoising approaches. Bartoccioni also sheds light on crash scenario simulations and how they can refine training for safer autonomous vehicles.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Limits of Expert-Defined Systems

  • Expert-defined representations in autonomous driving are rigid and incomplete.
  • They fail to detect or handle undefined or rare road events effectively.
INSIGHT

Generative Models Learn Driving World

  • Video generative models can implicitly learn world dynamics for driving.
  • They predict future frames and capture geometry and semantics without human annotation.
ADVICE

Training Strategy With Mixed Data

  • First pre-train on large uncalibrated data for general features.
  • Then fine-tune on smaller, calibrated datasets closer to the target domain.
Get the Snipd Podcast app to discover more snips from this episode
Get the app