

More Language, Less Labeling with Kate Saenko - #580
Jun 27, 2022
In this discussion with Kate Saenko, an associate professor at Boston University and a consulting professor at the MIT-IBM Watson AI Lab, she dives into the exciting world of multimodal learning. Kate highlights the significance of integrating vision and language, revealing innovations like DALI 2 and CLIP. She addresses bias in AI sourced from vast online datasets and shares insights on reducing labeling costs through effective prompting techniques. The conversation also touches on the challenges facing smaller labs in a resource-dominated landscape, alongside strategies for robust model generalization.
AI Snips
Chapters
Transcript
Episode notes
Early AI Interest
- Kate Saenko's interest in AI stemmed from a childhood fascination with science fiction and robots.
- Her academic journey began with speech recognition at MIT, later transitioning to computer vision.
Computer Vision's Transformation
- Computer vision has dramatically improved over time.
- Early computer vision only worked for limited applications like face detection, unlike today.
Multimodal Learning Breakthroughs
- Multimodal learning, like using audio and visuals for lip reading, has existed for a long time.
- The current breakthroughs stem from increased data and model sizes, enabling emergent properties in models like DALL-E 2.