
107 - Multi-Modal Transformers, with Hao Tan and Mohit Bansal
NLP Highlights
00:00
The Best Way to Train a Vision Plus Language and Coder?
The vising coder does not predict the vision path. Thet take the visein images as t adjiting tokens to the language instead. And iis just pretix ther crosmodalt matching. Ten nonteu reves and is qestions. If you had infinite compute and resources to get any already existing data that you could, what's your intuition for, like, the best way to train a vision plus language and coder does it? Does the question make sense?
Transcript
Play full episode