

The Right Way to Do AI Evals (ft Freddie Vargus)
Are your AI agents unreliable? In this guide, we reveal a professional system for AI evals to help you build and ship better AI products, faster. Learn how to systematically test LLM performance, evaluate complex tool use, and improve multi-turn conversations. We break down the exact process for building a high-quality eval dataset, using milestones and minefields to track agent behaviour, and how to properly use an LLM as a judge without compromising quality. Stop guessing and start making real, measurable improvements to your AI today.
Check out Quotient AI
https://www.quotientai.co/
Sign up for A.I. coaching for professionals at: https://www.anetic.co
Get FREE AI tools
pip install tool-use-ai
Connect with us https://x.com/ToolUseAI
https://x.com/MikeBirdTech
https://x.com/freddie_v4
00:00:00 - intro
00:02:54 - Why You Need AI Evals
00:06:13 - How to Evaluate AI Agent Tool Use
00:29:24 - The Process for Building Your First Eval Dataset
00:42:44 - Using an LLM as a Judge The Right Way
Subscribe for more insights on AI tools, productivity, and AI evals.
Tool Use is a weekly conversation with AI experts brought to you by Anetic.