
Open-Source AI with Vinod Valloppillil and Bob van Luijt - Weaviate Podcast #86!
Weaviate Podcast
Advancements in Language Architecture and Content Chunking
The current language architecture contains functional elements that are implemented and allocated differently. Content chunking is currently heuristic-based and hacky, but could become a learned behavior co-trained with embedding and language models. Startups like contextual AI are exploring these scenarios. Tokenization has evolved from a hacky process to a model-based approach, and similar advancements are expected in the retrieval and generation processes. While end-to-end tokenization is not widely adopted, byte pair encoding is still predominant.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.