Yannic Kilcher Videos (Audio Only) cover image

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding&Generation

Yannic Kilcher Videos (Audio Only)

00:00

Bootstrapping Method for Vision-Language Understanding

This chapter discusses a bootstrapping method for training a vision-language understanding model, including training captioners and filters to generate and filter large amounts of data from the internet. The speakers explore the architecture and data set used, highlighting the potential for future research in dynamic composition of modules and multitask pre-training.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app