High Agency: The Podcast for AI Builders

The End of Language-Only Models l Amit Jain, Luma AI

18 snips

May 13, 2025

Amit Jain, CEO and co-founder of Luma AI and former Apple Vision Pro engineer, discusses the future of AI beyond just language models. He emphasizes the importance of multimodal training, particularly the often-overlooked role of video in AI development. Amit shares insights on how combining audio, video, and text can revolutionize industries like entertainment and advertising. He also touches on the potential for fully AI-generated feature films and critiques trend-driven approaches in AI, advocating for more meaningful innovations.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Luma AI's Multimodal Vision

Multimodal general intelligence involves joint training on audio, video, language, and text to capture the full digital footprint.
Luma AI builds world models that learn like humans, integrating multiple modalities simultaneously rather than focusing solely on language.

INSIGHT

Limits of Language-Only AI Models

Current large AI labs focus heavily on language and treat other modalities as afterthoughts, leading to limitations in model capability.
Joint training on all modalities at scale is a strategic next frontier to overcome data limitations faced by text-only models.

INSIGHT

Continuous vs Discrete Modalities

Language models operate on discrete tokens, whereas real-world signals like video and audio require continuous representations.
Luma's new architecture uses continuous latent space for joint multimodal modeling, enabling reasoning over diverse data.

Get the Snipd Podcast app to discover more snips from this episode

Get the app