
107 - Multi-Modal Transformers, with Hao Tan and Mohit Bansal
NLP Highlights
00:00
Cross Attention Blocks in Vision Language?
Cross attention is a ld idea. Its used invision language cask, and is also used in sumarization. Ote texture is using the bidef model to handle the waiting comprinc of course,. That makense indement. This is very similar to the bydef ida. Think we we have two stacks tes atens, cross attento layers to build hihlio, repentations of the connections.
Transcript
Play full episode