The Second Experiment in the Paper Was Really Exciting

The researchers trained two models, one on the three billion tokens of Wikipedia and a second one on just a hundred million tokens subset of the same data. And they found that it improved performance beyond training that same model on the threebillion-to-100million dataset. This kind of shows us that that when the number of trainable parameters in the model is restricted, retrieving neighbors from the purpose can actually outperform training on it, which is an exciting result.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app