The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

More Language, Less Labeling with Kate Saenko - #580

Jun 27, 2022

In this discussion with Kate Saenko, an associate professor at Boston University and a consulting professor at the MIT-IBM Watson AI Lab, she dives into the exciting world of multimodal learning. Kate highlights the significance of integrating vision and language, revealing innovations like DALI 2 and CLIP. She addresses bias in AI sourced from vast online datasets and shares insights on reducing labeling costs through effective prompting techniques. The conversation also touches on the challenges facing smaller labs in a resource-dominated landscape, alongside strategies for robust model generalization.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Early AI Interest

Kate Saenko's interest in AI stemmed from a childhood fascination with science fiction and robots.
Her academic journey began with speech recognition at MIT, later transitioning to computer vision.

INSIGHT

Computer Vision's Transformation

Computer vision has dramatically improved over time.
Early computer vision only worked for limited applications like face detection, unlike today.

INSIGHT

Multimodal Learning Breakthroughs

Multimodal learning, like using audio and visuals for lip reading, has existed for a long time.
The current breakthroughs stem from increased data and model sizes, enabling emergent properties in models like DALL-E 2.

Get the Snipd Podcast app to discover more snips from this episode

Get the app