Transformer Memory as a Differentiable Search Index: memorizing thousands of random doc ids works!?

1

Introduction

00:00 • 2min

2

What Kind of Weird Things Language Models Memorize?

01:44 • 2min

3

How Does Autoregressive Entity Linking Work?

03:47 • 2min

4

Is There an Index in a Transformer Model?

05:54 • 2min

5

Is There a Differentiable Search Index?

07:50 • 2min

6

Indexing and Retrieval in a Data Structure

09:21 • 2min

7

Using Documents in the Indexing Phase?

11:37 • 2min

8

Indexing

13:10 • 2min

9

Using Direct Indexing to Get the First 32 Tokens of a Document

14:47 • 2min

10

Is There a Capacity Issue With Document IDs?

16:22 • 2min

11

Is This Really Semantically Structured?

18:28 • 2min

12

Exactly. Is It a Small Eight Layer Birth Model?

20:36 • 2min

13

Indexing a Corpus Using BM25?

22:22 • 2min

14

Using T5 Models for a Natural Question

24:21 • 2min

15

Using Hits at One and Hits at Ten

26:09 • 2min

16

Is the Interaction Between Model Size and Corpus Size Scaling Changing?

28:13 • 2min

17

Is the Semantic String Doc ID the Best?

30:00 • 3min

18

Semantic String Doc ID Is Better Than Atomic String ID

32:56 • 3min

19

Is T5 a Better Encoder?

35:29 • 2min

20

Bm25

37:08 • 2min

21

Zero Shot Transfer

39:34 • 2min

22

The Semantic String Doc ID vs Atomic Doc ID

41:32 • 2min

23

A Few Notes on the Future Work Applications

43:22 • 2min

24

Using the Model as a Way to Store the Documents

45:28 • 2min

25

Is There a Space Where This Could Be a Thing?

47:36 • 2min

26

The Sweet Spot for a BM25 Model?

49:28 • 2min

27

Is the Failure Modes a Problem for Dual Encoders?

51:38 • 2min

28

Is There a Constant Storage Cost Every Time You Add a Document?

53:22 • 3min

29

Is It Possible to Get Zero Shot Performance?

55:54 • 1min

30

Is It a T5 Model?

57:22 • 2min

31

Using the Titles as a Document ID?

59:49 • 2min