How I AI

Evals, error analysis, and better prompts: A systematic approach to improving your AI products | Hamel Husain (ML engineer)

253 snips
Oct 13, 2025
Hamel Husain, an ML engineer and AI consultant, discusses how to enhance AI product quality through systematic error analysis and prompt engineering. He emphasizes the importance of examining real user interactions to identify common failures. Hamel introduces a step-by-step error analysis framework and advocates for binary evaluations over arbitrary quality scores. He also outlines techniques for building custom annotation systems and prioritizing fixes based on data, ultimately guiding product teams toward more effective AI solutions.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Start With Real User Traces

  • The most important step to improve AI products is looking at real user data and traces.
  • Real inputs reveal vague, messy prompts that shape model behavior and priorities.
ANECDOTE

Nurture Boss Example: Vague User Inputs

  • Nurture Boss used traces from real leasing chats to find frequent failure modes.
  • The team found vague user messages often caused the assistant to answer the wrong intent.
ANECDOTE

Humorous Strange Model Reply

  • A trace showed the assistant replying "me get you" which revealed odd model behavior.
  • This kind of surprising output motivated deeper annotation and categorization.
Get the Snipd Podcast app to discover more snips from this episode
Get the app