The Growth Podcast

How to Do AI Evals Step-by-Step with Real Production Data | Tutorial by Hamel Husain and Shreya Shankar

Jan 15, 2026
Hamel Husain and Shreya Shankar, experienced instructors in AI evals, share their expertise on building reliable production AI. They discuss the critical importance of systematic evaluations over simple demos, emphasizing real-world error analysis. Listeners learn about analyzing real traces, identifying UX failures, and refining categories for actionable insights. The duo highlights the need for tailored evaluations and proper validations, advocating for structured methodologies that prioritize high-impact issues. This engaging tutorial is a must for aspiring PMs in AI.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Evals Are The PM Superpower

  • AI evals are the single most important new skill for PMs to ship reliable features.
  • Dogfooding alone won't scale; systematic evals catch production issues early.
ADVICE

Start With Traces Not Metrics

  • Start with observability and capture traces (CSV/JSON is fine).
  • Take notes on traces immediately — you can start with a simple viewer or spreadsheet.
ANECDOTE

Real Trace: Missed Request And Markdown SMS

  • Hamel shows a real NurtureBoss trace where the assistant promised to check bathroom configuration but did nothing.
  • The assistant also returned markdown in an SMS, which will render incorrectly for users.
Get the Snipd Podcast app to discover more snips from this episode
Get the app