Yannic Kilcher Videos (Audio Only) cover image

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding&Generation

Yannic Kilcher Videos (Audio Only)

00:00

Training a Filter and Captioner for Unified Vision-Language Understanding

In this chapter, they discuss the process of training and fine-tuning a filter and a captioner for unified vision-language understanding. They explain the need for supervised datasets, such as the cocoa dataset, and how they use this data to train a filter to determine if an image and text go together, and a captioner to generate synthetic captions for images.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app