Is There a Capacity Issue With Document IDs?

I assume at some point there will be capacity issues, right? You can't scale this up to 500 tokens and have a very small transformer. So I don't know how all these things interact, but maybe they kind of want to establish that this paradigm can work without having a massive model that's somehow capable of remembering very large documents. Yeah. The downs to the one token per one is like having a dictionary that is as big as the corpus,. Exactly. Or like a hyper large classification layer. Right. This is common with word piece models. This vocab needs to increase by the number of documents they want to store. If you have 100,000 document IDs to store, now

Play episode from 16:22

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app