

Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)
12 snips Aug 29, 2025
Shreya Shankar, a PhD candidate at UC Berkeley with experience at Google Brain and Facebook, dives into the world of AI agents and document processing. She sheds light on how LLMs can efficiently handle vast amounts of data, maintaining accuracy without breaking the bank. Topics include the importance of human error review, the intricacies of transforming LLM workflows into reliable pipelines, and the balance of using cheap vs. expensive models. Shreya also discusses how guardrails and structured approaches can enhance LLM outputs in real-world applications.
AI Snips
Chapters
Transcript
Episode notes
Treat LLM Workflows As ETL Pipelines
- Treat LLM workflows as ETL: map operators extract attributes and reduce operators aggregate or summarize.
- Search for the most accurate pipeline first, then optimize for cost while meeting accuracy guarantees.
Use Guardrails For Flaky LLM Outputs
- Add retries, code-based validators, and cheap LLM checks to catch flaky outputs like empty strings or garbage.
- Use 'gleaning' to rerun or validate until outputs meet simple properties like minimum theme counts.
Statewide Police Records Project
- Berkeley's California Police Records Access Project used LLMs to build a police-misconduct database that would take humans 35 years to compile.
- The project required careful prompt specs, iterations, and intern-led error analysis before full deployment.