
107 - Multi-Modal Transformers, with Hao Tan and Mohit Bansal
NLP Highlights
00:00
Do You Have a Problem With Multiple Captions?
If they do have multiple questions per image, how will you be able to re leverage all of them at the same time? We justguse each pair as a single trani instance. It depends on what liifits like machine translation style references, than they are probably just paraphrases of saying the same thing. But if it's more like four different captions of this image that are trying to cover different aspects of the image, then it's much more interesting,. because then it's sort of making sure that your fixs a coverage issue. So you could actually have a lost function that makes sure that cross these four captions, we have covered, for like, all parts
Transcript
Play full episode