
Transformer Memory as a Differentiable Search Index: memorizing thousands of random doc ids works!?
Neural Search Talks — Zeta Alpha
00:00
A Few Notes on the Future Work Applications
In some cases semantic string is just totally failing on the two larger subsets. There may be there's some optimization issue, but I don't know. The other thing that I thought was worth pointing out is this difference in document representations that they index. So we've been talking about what they call the direct, which is just take the first L tokens from a document. And it's kind of a similar case. But if you increase that too much, if you take the first 128, you take a really big performance hit. It's something like going from 22 hits at 1 to 13 or 14 hits at 1 very roughly. We're talking about figure four, if you have the paper
Transcript
Play full episode