Do You Think Memorization Could Be Helpful in a Small Model Setting?

In this experiment, we used the Toronto books corpus and we found that when we augment the base LM, which was trained on Wikipedia, three billion tokens of Wikipedia, it actually helps to improve performance by a pretty large margin. So using a single model, we can we can generalize in different domains by simply swapping out which data what data is contained or which data store is being used with an LM. Right. Do you think after a point of time you wouldn't need any more data to learn good representations? Do you think that's possible? I think for the small model setting, this seems to be true based on this experiment for larger models. It starts to get into a murky territory

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app