High Agency: The Podcast for AI Builders

The End of Language-Only Models l Amit Jain, Luma AI

18 snips
May 13, 2025
Amit Jain, CEO and co-founder of Luma AI and former Apple Vision Pro engineer, discusses the future of AI beyond just language models. He emphasizes the importance of multimodal training, particularly the often-overlooked role of video in AI development. Amit shares insights on how combining audio, video, and text can revolutionize industries like entertainment and advertising. He also touches on the potential for fully AI-generated feature films and critiques trend-driven approaches in AI, advocating for more meaningful innovations.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Luma AI's Multimodal Vision

  • Multimodal general intelligence involves joint training on audio, video, language, and text to capture the full digital footprint.
  • Luma AI builds world models that learn like humans, integrating multiple modalities simultaneously rather than focusing solely on language.
INSIGHT

Limits of Language-Only AI Models

  • Current large AI labs focus heavily on language and treat other modalities as afterthoughts, leading to limitations in model capability.
  • Joint training on all modalities at scale is a strategic next frontier to overcome data limitations faced by text-only models.
INSIGHT

Continuous vs Discrete Modalities

  • Language models operate on discrete tokens, whereas real-world signals like video and audio require continuous representations.
  • Luma's new architecture uses continuous latent space for joint multimodal modeling, enabling reasoning over diverse data.
Get the Snipd Podcast app to discover more snips from this episode
Get the app