The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Scaling Multi-Modal Generative AI with Luke Zettlemoyer - #650

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00

The Potential of Grounding Text with Images

Models that incorporate extra data like video and images can produce amazing and highly compositional results. There is a strong relationship between text and the next piece of text, but the relationship between text and images is less direct. The goal is to add images to a text-only language model and improve its performance. Researchers are working on scaling laws of multimodal competition to predict when bimodal training will outperform unimodal training.

Play episode from 06:07
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app