Neural Search Talks — Zeta Alpha cover image

ColBERT + ColBERTv2: late interaction at a reasonable inference cost

Neural Search Talks — Zeta Alpha

00:00

The Problem With Quantization and Dimensionality Reduction

So they take the top 1000 candidates for every query term and then they return the top 1000 after re-ranking these. We just still like a very big set of documents to re-rank. If you were talking about some crossing code or something, it's a lot. But here they've already loaded these documents into memory. So there's very little to do. They just need to do the full computation.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner