
High Agency: The Podcast for AI Builders
The End of Language-Only Models l Amit Jain, Luma AI
May 13, 2025
Amit Jain, CEO and co-founder of Luma AI and former Apple Vision Pro engineer, discusses the future of AI beyond just language models. He emphasizes the importance of multimodal training, particularly the often-overlooked role of video in AI development. Amit shares insights on how combining audio, video, and text can revolutionize industries like entertainment and advertising. He also touches on the potential for fully AI-generated feature films and critiques trend-driven approaches in AI, advocating for more meaningful innovations.
40:17
Episode guests
AI Summary
AI Chapters
Episode notes
Podcast summary created with Snipd AI
Quick takeaways
- Luma AI emphasizes the need for multimodal training that integrates diverse data types for enhanced AI capabilities beyond just language.
- The podcast highlights Luma's innovative approach with Inductive Moment Matching, enabling models to effectively process and understand various modalities simultaneously.
Deep dives
Multimodal General Intelligence Development
Luma AI aims to build multimodal general intelligence by jointly training models on diverse data types such as audio, video, language, and text. This holistic approach allows the models to learn in a way that mirrors human cognitive processes, enhancing their capability to interpret and generate content across different media. The focus on combining these modalities from the start, rather than pre-training on language alone, is presented as a revolutionary shift in AI model development. By grounding their models in a broader dataset that encompasses the entirety of humanity's digital interactions, Luma believes they can create more effective AI systems for real-world applications.