Neural Search Talks — Zeta Alpha cover image

Transformer Memory as a Differentiable Search Index: memorizing thousands of random doc ids works!?

Neural Search Talks — Zeta Alpha

00:00

Semantic String Doc ID Is Better Than Atomic String ID

I find it hard to rub my head around this idea that the model would just learn one separate token for each document and that's it. I think I could imagine a world where atomic doc ID is better than that with perfect optimization. So at least to me, the big takeaway is semantic string doc ID does seem to perform best in the large corpus, large model case. It's unclear to what degree things could be optimized further and the results would change based on that.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app