Multimodal AI is on the verge of an explosive development, but challenges like compute limitations need to be addressed. To achieve true multimodal capabilities, enhanced computing power and data centers will be essential. By 2025-2026, the compute limitations are anticipated to diminish, enabling the creation of more powerful models. Jan Lecun suggests that models should learn akin to humans, observing and understanding the world. Synthetic data is likely to become the primary source for vision training, allowing rapid model training by following the laws of physics. Interactions through chat, voice, and vision will become more prevalent, enabling intuitive conversations and guidance. Devices like phones, glasses, earbuds, watches, bracelets, and rings are expected to play a significant role in AI interactions. This evolution in AI technologies is poised to revolutionize business environments and redefine work processes, setting the stage for the AI agents explosion around 2025-2027.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode