
How to Build and Optimize AI Research Agents
The Data Exchange with Ben Lorica
00:00
Evaluating and Benchmarking Deep Research
Ben asks how to evaluate; Jakub outlines creating expert‑calibrated benchmarks and automated LLM judges to make iterative improvements.
Transcript
Play full episode