
107 - Multi-Modal Transformers, with Hao Tan and Mohit Bansal
NLP Highlights
00:00
The Self Attention Layer of the Transform Performer.
We use the sam architecture here. We first have a cross attention attend to the tars o modality, and the wek, we have a serfor attention layer. In bataf model, you would first have crossed attention layer, and then you would have aramodering layer to prosess the fused information better. So the serfited tention is place of this modelling. But we could consider it as a two transformtcoders in parlie.
Transcript
Play full episode