Episode 60: 10 Things I Hate About AI Evals with Hamel Husain

150 snips

Sep 30, 2025

Guest

Bryan Bischoff

Guest

Hamel Husain

Hamel Husain, a machine learning engineer and evals expert, discusses the pitfalls of AI evaluations and how to adopt a data-centric approach for reliable results. He outlines ten critical mistakes teams make, debunking ineffective metrics like 'hallucination scores' in favor of tailored analytics. Hamel shares a workflow for effective error analysis, including involving domain experts wisely and avoiding hasty automation. Bryan Bischoff joins as a guest to introduce the 'Failure as a Funnel' concept, emphasizing focused debugging for complex AI systems.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Evals Are Measurement Not Magic

Evals are a systematic way to measure AI applications and are essential to improvement.
Hamel Husain stresses you cannot improve what you do not measure.

ADVICE

Begin With Error Analysis

Start with error analysis to find the biggest, easiest wins before writing evals.
Use that analysis to decide between code-based asserts or LLM judges and where to prioritize effort.

ADVICE

Don't Chase Generic Off-The-Shelf Metrics

Avoid off-the-shelf generic metrics like 'hallucination score' as primary KPIs.
Use such metrics only to explore data, then tailor metrics to your domain.

Get the Snipd Podcast app to discover more snips from this episode

Get the app