How to Scale Up Large Text Corpora

A default strategy for using large, unlabelled text corpora is left to write language modelling. But i think there are many other tasks that you could hypothetically do from such a data set. For example, there's tons of hyperlynks out on the internet. And if i take a random hyper lynk and ask you to predict what you ar, eik goes to, you can get lots of information out of it. That usually seems to work well with the scaling factor.

Play episode from 22:34

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app