The Data Exchange with Ben Lorica

Building Production-Grade RAG at Scale

Jun 26, 2025
Douwe Kiela, Founder and CEO of Contextual AI and an adjunct professor at Stanford, delves into the relevance of Retrieval-Augmented Generation (RAG) amidst evolving AI contexts. He explains the shift to RAG 2.0, emphasizing its potential as an end-to-end trainable system. The conversation highlights the challenges of document understanding, the importance of structured information in extraction, and how hybrid retrieval methods can streamline data access. Douwe also speculates on future advancements in model fine-tuning, emphasizing the need for expert feedback and open-source contributions.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

RAG Still Essential Despite Long Contexts

  • Expanding context windows alone do not make RAG obsolete because long contexts waste compute and degrade performance beyond a few thousand tokens.
  • Combining RAG with long context models is more efficient and effective than relying solely on huge context windows.
INSIGHT

RAG 2.0 Is an End-to-End System

  • RAG should be treated as a fully integrated system optimized end-to-end rather than a set of disconnected components.
  • Joint optimization of extraction, retrieval, re-ranking, and generation improves accuracy and reliability.
ADVICE

Optimize Early Pipeline Stages

  • Optimize document extraction by using layout-aware segmentation and machine-learned chunking tailored to document structure.
  • Choose embeddings that are jointly optimized with the retrieval and ranking components for best RAG performance.
Get the Snipd Podcast app to discover more snips from this episode
Get the app