
Transformer Memory as a Differentiable Search Index: memorizing thousands of random doc ids works!?
Neural Search Talks — Zeta Alpha
00:00
Exactly. Is It a Small Eight Layer Birth Model?
In practical terms, they use, I think, ten clusters, right? They start with ten clusters, and those, you know, give them a zero to nine for the first digit. But then they cluster recursively within each of these. So it kind of naturally translates into decimal thing, right? At least that's how I understand. Okay. Now let's talk about this differentiation between unsupervised pre-training and which leads to zero-shot sort of testing of the model versus fine-tuning on this. Yeah. We've talked about two of these training tasks, I guess we'll call them, right?
Transcript
Play full episode