
107 - Multi-Modal Transformers, with Hao Tan and Mohit Bansal
NLP Highlights
00:00
What Is the Language in Bedding?
An image is naturally a two dimensional ray, that you have heat and wites. S for the language in bedding is just a sequence of word and witys position in bedding. The tooth we use here is an objective detector that the object detector tries to detect some meaninful object in the image. It's just some rectangles on the image which contains some meniful objects, labels, or something like this. Then we just use thise object as the input of the future. And the position imbeddings, just as a languade. So this is the general idea of observation.
Transcript
Play full episode