
Episode 01: Kelvin Guu, Google AI, on language models & overlooked research problems
Generally Intelligent
00:00
How to Scale Up Large Text Corpora
A default strategy for using large, unlabelled text corpora is left to write language modelling. But i think there are many other tasks that you could hypothetically do from such a data set. For example, there's tons of hyperlynks out on the internet. And if i take a random hyper lynk and ask you to predict what you ar, eik goes to, you can get lots of information out of it. That usually seems to work well with the scaling factor.
Transcript
Play full episode