Challenges in Developing Large Video Models Compared to Text-based Models

4min Snip

00:00

Play full episode

Summary

Transcript

Episode notes

Developing large video models poses significant challenges compared to text-based models due to the complexity and richness of video data. Unlike text, video involves predicting distributions over all possible frames, representing high-dimensional continuous spaces, and capturing intricate details creating hurdles in accurate predictions. Models incorporating latent variables to capture unperceived information have failed, along with attempts using neural nets, GANS, VAEs, and other methods to predict missing parts of video or images. These failures contrast with the success of similar methods in text-based models like L&Ms, showing the difficulty in effectively learning representations for video data.

Yann LeCun is the Chief AI Scientist at Meta, professor at NYU, Turing Award winner, and one of the most influential researchers in the history of AI. Please support this podcast by checking out our sponsors:
– HiddenLayer: https://hiddenlayer.com/lex
– LMNT: https://drinkLMNT.com/lex to get free sample pack
– Shopify: https://shopify.com/lex to get $1 per month trial
– AG1: https://drinkag1.com/lex to get 1 month supply of fish oil

Transcript: https://lexfridman.com/yann-lecun-3-transcript

EPISODE LINKS:
Yann’s Twitter: https://twitter.com/ylecun
Yann’s Facebook: https://facebook.com/yann.lecun
Meta AI: https://ai.meta.com/

PODCAST INFO:
Podcast website: https://lexfridman.com/podcast
Apple Podcasts: https://apple.co/2lwqZIr
Spotify: https://spoti.fi/2nEwCF8
RSS: https://lexfridman.com/feed/podcast/
YouTube Full Episodes: https://youtube.com/lexfridman
YouTube Clips: https://youtube.com/lexclips

SUPPORT & CONNECT:
– Check out the sponsors above, it’s the best way to support this podcast
– Support on Patreon: https://www.patreon.com/lexfridman
– Twitter: https://twitter.com/lexfridman
– Instagram: https://www.instagram.com/lexfridman
– LinkedIn: https://www.linkedin.com/in/lexfridman
– Facebook: https://www.facebook.com/lexfridman
– Medium: https://medium.com/@lexfridman

OUTLINE:
Here’s the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time.
(00:00) – Introduction
(09:10) – Limits of LLMs
(20:47) – Bilingualism and thinking
(24:39) – Video prediction
(31:59) – JEPA (Joint-Embedding Predictive Architecture)
(35:08) – JEPA vs LLMs
(44:24) – DINO and I-JEPA
(45:44) – V-JEPA
(51:15) – Hierarchical planning
(57:33) – Autoregressive LLMs
(1:12:59) – AI hallucination
(1:18:23) – Reasoning in AI
(1:35:55) – Reinforcement learning
(1:41:02) – Woke AI
(1:50:41) – Open source
(1:54:19) – AI and ideology
(1:56:50) – Marc Andreesen
(2:04:49) – Llama 3
(2:11:13) – AGI
(2:15:41) – AI doomers
(2:31:31) – Joscha Bach
(2:35:44) – Humanoid robots
(2:44:52) – Hope for the future