Building Production-Grade RAG at Scale

16 snips

Jun 26, 2025

Douwe Kiela, Founder and CEO of Contextual AI and an adjunct professor at Stanford, delves into the relevance of Retrieval-Augmented Generation (RAG) amidst evolving AI contexts. He explains the shift to RAG 2.0, emphasizing its potential as an end-to-end trainable system. The conversation highlights the challenges of document understanding, the importance of structured information in extraction, and how hybrid retrieval methods can streamline data access. Douwe also speculates on future advancements in model fine-tuning, emphasizing the need for expert feedback and open-source contributions.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

RAG Still Essential Despite Long Contexts

Expanding context windows alone do not make RAG obsolete because long contexts waste compute and degrade performance beyond a few thousand tokens.
Combining RAG with long context models is more efficient and effective than relying solely on huge context windows.

INSIGHT

RAG 2.0 Is an End-to-End System

RAG should be treated as a fully integrated system optimized end-to-end rather than a set of disconnected components.
Joint optimization of extraction, retrieval, re-ranking, and generation improves accuracy and reliability.

ADVICE

Optimize Early Pipeline Stages

Optimize document extraction by using layout-aware segmentation and machine-learned chunking tailored to document structure.
Choose embeddings that are jointly optimized with the retrieval and ranking components for best RAG performance.

Get the Snipd Podcast app to discover more snips from this episode

Get the app