
How I AI Evals, error analysis, and better prompts: A systematic approach to improving your AI products | Hamel Husain (ML engineer)
253 snips
Oct 13, 2025 Hamel Husain, an ML engineer and AI consultant, discusses how to enhance AI product quality through systematic error analysis and prompt engineering. He emphasizes the importance of examining real user interactions to identify common failures. Hamel introduces a step-by-step error analysis framework and advocates for binary evaluations over arbitrary quality scores. He also outlines techniques for building custom annotation systems and prioritizing fixes based on data, ultimately guiding product teams toward more effective AI solutions.
AI Snips
Chapters
Transcript
Episode notes
Start With Real User Traces
- The most important step to improve AI products is looking at real user data and traces.
- Real inputs reveal vague, messy prompts that shape model behavior and priorities.
Nurture Boss Example: Vague User Inputs
- Nurture Boss used traces from real leasing chats to find frequent failure modes.
- The team found vague user messages often caused the assistant to answer the wrong intent.
Humorous Strange Model Reply
- A trace showed the assistant replying "me get you" which revealed odd model behavior.
- This kind of surprising output motivated deeper annotation and categorization.

