Indexing a Corpus Using BM25?

The ideal setup is a multi-task setup where they, in some proportion, use both of these. So during training, sometimes it's document to document ID and sometimes it's query to document ID. I think the former's 32 times more common if I were a later figure correctly,. But it's some mixture like this where they more often do the first task. This is the supervised setting. And then the zero-shot setting is just not using the second task. Then they skip that because this requires labeled data, right? It requires labels of what document is relevant. Assume they don't have it.

Play episode from 22:22

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app