The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Learning Visiolinguistic Representations with ViLBERT w/ Stefan Lee - #358

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00

Bridging Vision and Language

This chapter explores the integration of vision and language via a self-supervised approach using large datasets, particularly focusing on the Conceptual Captions dataset. It discusses the complexities of training a BERT-like model to understand multimodal input, the challenges of masking in images, and the implementation of co-attentional mechanisms for effective interaction between visual and textual representations.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app