107 - Multi-Modal Transformers, with Hao Tan and Mohit Bansal

1

Introduction

00:00 • 2min

2

Using Pre-Trained Transformers to Do Long Tail Image Processing

02:12 • 2min

3

The Multi-Modal Representation Is a Good Example

04:00 • 2min

4

Using Machine Translations to Resolve Ambiguities

05:42 • 2min

5

What Is the Language in Bedding?

07:38 • 2min

6

Cross Attention Blocks in Vision Language?

09:15 • 2min

7

The Self Attention Layer of the Transform Performer.

11:10 • 2min

8

How to Pretrend Your Model?

13:12 • 1min

9

The Intuition Behind Object Recognition Returning Tasks

14:41 • 1min

10

Do You Have a Future Regression Task?

16:11 • 2min

11

Using Image Captioning and Vecuaded Sets for Tree Transcription

17:47 • 2min

12

Is There Room for a Latent Alignment Model?

19:46 • 2min

13

Do You Have a Problem With Multiple Captions?

21:42 • 2min

14

Do You Have Any Overlapping Tasks in Your Moral Training?

23:32 • 2min

15

A Question About High Level Trancs in a Paper

25:28 • 2min

16

Can You Give a Quick Summary of Your Results?

27:11 • 2min

17

Is It Possible to Pretrain a Bird Like Modern?

29:03 • 2min

18

The Differences Between Lexbert and Other Multim Transforme Papers

30:47 • 2min

19

The Best Way to Train a Vision Plus Language and Coder?

32:22 • 2min

20

The Shall Teach the Children to Learn the Language

34:07 • 2min

21

Cosmodaligrams

35:44 • 2min