
More Language, Less Labeling with Kate Saenko - #580
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
00:00
Exploring the Convergence of Vision and Language in Multimodal Learning
This chapter explores the integration of vision and language within multimodal machine learning, highlighting historic advancements such as audio-visual speech recognition. It showcases key innovations and models like DALI 2 and Clip, emphasizing the role of unlabelled data in improving performance, especially in zero-shot learning contexts.
Transcript
Play full episode