The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Video as a Universal Interface for AI Reasoning with Sherry Yang - #676

18 snips

Mar 18, 2024

Sherry Yang, a Senior Research Scientist at Google DeepMind and a PhD candidate at UC Berkeley, discusses her groundbreaking work on video as a universal interface for AI reasoning. She draws parallels between video generation models and language models, highlighting their potential in real-world decision-making tasks. The conversation covers the integration of video in robotics, the challenges of effective labeling, and the exciting applications of interactive simulators. Sherry also unveils UniSim, showcasing the future of engaging with AI-generated environments.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Video as Unified Data Format

Video is a unified data format like text, containing rich information about the world.
This unified format allows for training a single model with a unified objective, similar to language models.

INSIGHT

Challenges in Video Data

Video data often lacks explicit labels, making self-supervised training challenging.
Unlike text, where future words serve as labels, video requires specific labels for controlled generation.

INSIGHT

Richness of Video Information

Videos implicitly capture detailed physical information, unlike high-level language descriptions.
This makes video ideal for tasks like learning complex procedures or visual reasoning.

Get the Snipd Podcast app to discover more snips from this episode

Get the app