

Transformer Memory as a Differentiable Search Index: memorizing thousands of random doc ids works!?
Mar 23, 2022
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Introduction
00:00 • 2min
What Kind of Weird Things Language Models Memorize?
01:44 • 2min
How Does Autoregressive Entity Linking Work?
03:47 • 2min
Is There an Index in a Transformer Model?
05:54 • 2min
Is There a Differentiable Search Index?
07:50 • 2min
Indexing and Retrieval in a Data Structure
09:21 • 2min
Using Documents in the Indexing Phase?
11:37 • 2min
Indexing
13:10 • 2min
Using Direct Indexing to Get the First 32 Tokens of a Document
14:47 • 2min
Is There a Capacity Issue With Document IDs?
16:22 • 2min
Is This Really Semantically Structured?
18:28 • 2min
Exactly. Is It a Small Eight Layer Birth Model?
20:36 • 2min
Indexing a Corpus Using BM25?
22:22 • 2min
Using T5 Models for a Natural Question
24:21 • 2min
Using Hits at One and Hits at Ten
26:09 • 2min
Is the Interaction Between Model Size and Corpus Size Scaling Changing?
28:13 • 2min
Is the Semantic String Doc ID the Best?
30:00 • 3min
Semantic String Doc ID Is Better Than Atomic String ID
32:56 • 3min
Is T5 a Better Encoder?
35:29 • 2min
Bm25
37:08 • 2min
Zero Shot Transfer
39:34 • 2min
The Semantic String Doc ID vs Atomic Doc ID
41:32 • 2min
A Few Notes on the Future Work Applications
43:22 • 2min
Using the Model as a Way to Store the Documents
45:28 • 2min
Is There a Space Where This Could Be a Thing?
47:36 • 2min
The Sweet Spot for a BM25 Model?
49:28 • 2min
Is the Failure Modes a Problem for Dual Encoders?
51:38 • 2min
Is There a Constant Storage Cost Every Time You Add a Document?
53:22 • 3min
Is It Possible to Get Zero Shot Performance?
55:54 • 1min
Is It a T5 Model?
57:22 • 2min
Using the Titles as a Document ID?
59:49 • 2min