All Things Product with Teresa and Petra

AI Evals & Discovery

10 snips
Sep 23, 2025
Dive into the world of AI evaluations, where evals go beyond mere quality checks. Discover the nuances between golden datasets, synthetic data, and real-world traces. Learn how to spot error modes and turn them into actionable evaluations. Uncover the importance of continuous maintenance and the concept of criteria drift. Explore the interplay of evals, guardrails, and human oversight, as well as the critical role of discovery practices in shaping effective AI product evaluations.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Evals Are Not Simple QA

  • Evals are more than simple QA: they measure whether an AI product is actually good in practice.
  • Traditional test scripts don't capture the ongoing, evolving nature of AI quality.
ANECDOTE

Monitoring My Interview Coach

  • Teresa built an Interview Coach and CC'd herself on feedback to monitor mistakes.
  • She found the coach sometimes suggested wrong question types, prompting deeper investigation.
INSIGHT

Golden Datasets Depend On Representativeness

  • Golden datasets pair inputs with desired outputs to score model changes.
  • Their value depends entirely on how well they represent production inputs.
Get the Snipd Podcast app to discover more snips from this episode
Get the app