Yannic Kilcher Videos (Audio Only)

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding&Generation

Mar 25, 2022
The podcast discusses a paper on BLIP, a technique for bootstrapping data sets in vision and language pre-training. They explore the benefits of cross-modal pre-training and the issues it faces. The paper introduces BLIP, a more versatile model that creates and improves its own dataset. The podcast covers the paper's contributions, model architecture, and how data flows through the model. They also discuss captioning and filtering bootstrapping, as well as fine-tuning the model for downstream tasks.
Ask episode
Chapters
Transcript
Episode notes