Neural Search Talks β€” Zeta Alpha cover image

ColBERT + ColBERTv2: late interaction at a reasonable inference cost

Neural Search Talks β€” Zeta Alpha

00:00

The Problem With Quantization and Dimensionality Reduction

So they take the top 1000 candidates for every query term and then they return the top 1000 after re-ranking these. We just still like a very big set of documents to re-rank. If you were talking about some crossing code or something, it's a lot. But here they've already loaded these documents into memory. So there's very little to do. They just need to do the full computation.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app