The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Scaling Multi-Modal Generative AI with Luke Zettlemoyer - #650

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00

The Potential of Grounding Text with Images

Models that incorporate extra data like video and images can produce amazing and highly compositional results. There is a strong relationship between text and the next piece of text, but the relationship between text and images is less direct. The goal is to add images to a text-only language model and improve its performance. Researchers are working on scaling laws of multimodal competition to predict when bimodal training will outperform unimodal training.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app