
137 - Nearest Neighbor Language Modeling and Machine Translation, with Urvashi Khandelwal
NLP Highlights
00:00
The Second Experiment in the Paper Was Really Exciting
The researchers trained two models, one on the three billion tokens of Wikipedia and a second one on just a hundred million tokens subset of the same data. And they found that it improved performance beyond training that same model on the threebillion-to-100million dataset. This kind of shows us that that when the number of trainable parameters in the model is restricted, retrieving neighbors from the purpose can actually outperform training on it, which is an exciting result.
Transcript
Play full episode