

Episode 60: 10 Things I Hate About AI Evals with Hamel Husain
94 snips Sep 30, 2025
Hamel Husain, a machine learning engineer and evals expert, discusses the pitfalls of AI evaluations and how to adopt a data-centric approach for reliable results. He outlines ten critical mistakes teams make, debunking ineffective metrics like 'hallucination scores' in favor of tailored analytics. Hamel shares a workflow for effective error analysis, including involving domain experts wisely and avoiding hasty automation. Bryan Bischoff joins as a guest to introduce the 'Failure as a Funnel' concept, emphasizing focused debugging for complex AI systems.
AI Snips
Chapters
Transcript
Episode notes
Evals Are Measurement Not Magic
- Evals are a systematic way to measure AI applications and are essential to improvement.
- Hamel Husain stresses you cannot improve what you do not measure.
Begin With Error Analysis
- Start with error analysis to find the biggest, easiest wins before writing evals.
- Use that analysis to decide between code-based asserts or LLM judges and where to prioritize effort.
Don't Chase Generic Off-The-Shelf Metrics
- Avoid off-the-shelf generic metrics like 'hallucination score' as primary KPIs.
- Use such metrics only to explore data, then tailor metrics to your domain.