AI + a16z cover image

AI + a16z

Beyond Language: Inside a Hundred-Trillion-Token Video Model

Jul 3, 2024
Luma Chief Scientist Jiaming Song discusses the Dream Machine 3D model, trained on vast video data, showcasing emergent reasoning abilities. He explains the 'bitter lesson' applied to generative models and the shift towards using more compute for simpler methods. The podcast delves into the evolution of GANs, limitations of scaling language models, advancements in fine-tuning 2D models for 3D representations, and revolutionizing graphics with Dream Machine technology.
01:05:14

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Tokenizing videos presents a new challenge compared to language, requiring innovative approaches for diverse datasets.
  • Dream Machine's efficient architecture enables faster training with long sequences, showcasing the importance of design for large-scale models.

Deep dives

Model Training Data Comparison: LAMA3 vs. Dream Machine

The world's largest open source model, LAMA3, was trained on 15 trillion tokens, whereas Dream Machine v0, the smallest model, is trained on hundreds of trillions of tokens.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app