NLP Highlights cover image

107 - Multi-Modal Transformers, with Hao Tan and Mohit Bansal

NLP Highlights

00:00

The Best Way to Train a Vision Plus Language and Coder?

The vising coder does not predict the vision path. Thet take the visein images as t adjiting tokens to the language instead. And iis just pretix ther crosmodalt matching. Ten nonteu reves and is qestions. If you had infinite compute and resources to get any already existing data that you could, what's your intuition for, like, the best way to train a vision plus language and coder does it? Does the question make sense?

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app