The Future of Mass Language Modeling

Twitter has recently published a paper on indexing using colbert which is like a token level representation thing where it's like they call it late interaction. They would put that twitter tweets doc id in the end and as the new searcher comes in searching tweets they would read backwards from the beginning of so basically what they did is that they kind of like encoded the temporal nature of tweets to make them more fresh. The majority of users will only use minutia or splayed vectors but if you have 5,000 direct messages scrolling through a day they will take half an hour to search for each one. i was just thinking um 10 years ago on berylline buzzwords there

Play episode from 52:28

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app