
137 - Nearest Neighbor Language Modeling and Machine Translation, with Urvashi Khandelwal
NLP Highlights
00:00
Do You Think Memorization Could Be Helpful in a Small Model Setting?
In this experiment, we used the Toronto books corpus and we found that when we augment the base LM, which was trained on Wikipedia, three billion tokens of Wikipedia, it actually helps to improve performance by a pretty large margin. So using a single model, we can we can generalize in different domains by simply swapping out which data what data is contained or which data store is being used with an LM. Right. Do you think after a point of time you wouldn't need any more data to learn good representations? Do you think that's possible? I think for the small model setting, this seems to be true based on this experiment for larger models. It starts to get into a murky territory
Transcript
Play full episode