Generally Intelligent cover image

Episode 01: Kelvin Guu, Google AI, on language models & overlooked research problems

Generally Intelligent

00:00

How to Scale Up Large Text Corpora

A default strategy for using large, unlabelled text corpora is left to write language modelling. But i think there are many other tasks that you could hypothetically do from such a data set. For example, there's tons of hyperlynks out on the internet. And if i take a random hyper lynk and ask you to predict what you ar, eik goes to, you can get lots of information out of it. That usually seems to work well with the scaling factor.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app