Vanishing Gradients

Episode 60: 10 Things I Hate About AI Evals with Hamel Husain

94 snips
Sep 30, 2025
Hamel Husain, a machine learning engineer and evals expert, discusses the pitfalls of AI evaluations and how to adopt a data-centric approach for reliable results. He outlines ten critical mistakes teams make, debunking ineffective metrics like 'hallucination scores' in favor of tailored analytics. Hamel shares a workflow for effective error analysis, including involving domain experts wisely and avoiding hasty automation. Bryan Bischoff joins as a guest to introduce the 'Failure as a Funnel' concept, emphasizing focused debugging for complex AI systems.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Evals Are Measurement Not Magic

  • Evals are a systematic way to measure AI applications and are essential to improvement.
  • Hamel Husain stresses you cannot improve what you do not measure.
ADVICE

Begin With Error Analysis

  • Start with error analysis to find the biggest, easiest wins before writing evals.
  • Use that analysis to decide between code-based asserts or LLM judges and where to prioritize effort.
ADVICE

Don't Chase Generic Off-The-Shelf Metrics

  • Avoid off-the-shelf generic metrics like 'hallucination score' as primary KPIs.
  • Use such metrics only to explore data, then tailor metrics to your domain.
Get the Snipd Podcast app to discover more snips from this episode
Get the app