Advancements in Language Architecture and Content Chunking

Open-Source AI with Vinod Valloppillil and Bob van Luijt - Weaviate Podcast #86!

Weaviate Podcast

NOTE

Advancements in Language Architecture and Content Chunking

The current language architecture contains functional elements that are implemented and allocated differently. Content chunking is currently heuristic-based and hacky, but could become a learned behavior co-trained with embedding and language models. Startups like contextual AI are exploring these scenarios. Tokenization has evolved from a hacky process to a model-based approach, and similar advancements are expected in the retrieval and generation processes. While end-to-end tokenization is not widely adopted, byte pair encoding is still predominant.

00:00

Transcript

Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.