AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How Many Domains Did You Find That Are No Longer Valid?
Nicholas: We looked at some data sets like conceptual captions, 3 million that was constructed in 2018. And there it's even more. We get even more of this rate of expiration. It suggests that people are going to start crawling sort of the dark corners of the internet in order to be able to find more of it. So we haven't done too much on this for large language models. But yeah, the language model case should be possible.