Can and LM Models - How Do They Work?

The Can and LM formulation separates the problems of representation learning or text similarity and next word prediction. The keys in our data store will be the context representations, which we get by doing forward passes with our language model over the data that needs to be memorized. And because the same value appears multiple times in the data store, we could end up with duplicate values in our retrieved set as well. In this way, we basically get a canon distribution over the same vocabulary that the Lm distribution output is over. Where any word from the vocab that doesn't appear in the retrieved set just gets a probability of zero.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app