The fastest agent in the race has the best evals

Nov 14, 2025

Benjamin Klieger, the lead engineer at Groq, shares insights on AI agent infrastructure. He discusses the groundbreaking Compound agent that effectively searches the web and executes tasks quickly. Benjamin explains how Groq’s custom LPU hardware allows for lightning-fast inference, transforming a one-minute process into mere seconds. He also touches on the importance of dynamic evals and real-time datasets, critiquing traditional benchmarks for their lack of novelty. Join him as he explores the future of efficient AI agents and the metrics that define their success.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Agent-as-a-Drop-In Model Replacement

Compound turns a model call into an agent that can search the web, run code, and use external tools automatically.
Benjamin Klieger says this delivers responses in about five to ten seconds by routing tasks to the best models.

ADVICE

Route Tasks To The Right Models

Pick the right inference provider and route tasks to models best suited for each subtask to improve quality and cost.
Prioritize speed because latency dramatically changes user experience for real-time agents.

INSIGHT

Speed Comes From Parallelism And Fast Tools

Achieving sub-10 second agent responses requires fast inference, low-latency tools, and parallelization of subtasks.
Delegating work to sub-agents enables concurrent searches and comparisons to reduce end-to-end latency.

Get the Snipd Podcast app to discover more snips from this episode

Get the app