
The fastest agent in the race has the best evals
The Stack Overflow Podcast
00:00
Evaluating models, agents, and efficiency metrics
Benjamin explains evaluating raw models, full agent systems, and new metrics like intelligence per second and per dollar.
Play episode from 18:20
Transcript


