The Pros and Cons of Using Multi-Modal Language Models

Research is mostly in multi-modal language technologies. One of the most, in my opinion, promising potential use cases for vision and language models is to improve web accessibility. Images are often used for a communicative purpose online. And so simply substituting in an accurate description is oftentimes not quite enough because there's so many ways to describe an image.

Play episode from 01:00:38

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app