All Things Product with Teresa and Petra

AI Evals & Discovery

31 snips

Sep 23, 2025

Dive into the world of AI evaluations, where evals go beyond mere quality checks. Discover the nuances between golden datasets, synthetic data, and real-world traces. Learn how to spot error modes and turn them into actionable evaluations. Uncover the importance of continuous maintenance and the concept of criteria drift. Explore the interplay of evals, guardrails, and human oversight, as well as the critical role of discovery practices in shaping effective AI product evaluations.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Evals Are Not Simple QA

Evals are more than simple QA: they measure whether an AI product is actually good in practice.
Traditional test scripts don't capture the ongoing, evolving nature of AI quality.

ANECDOTE

Monitoring My Interview Coach

Teresa built an Interview Coach and CC'd herself on feedback to monitor mistakes.
She found the coach sometimes suggested wrong question types, prompting deeper investigation.

INSIGHT

Golden Datasets Depend On Representativeness

Golden datasets pair inputs with desired outputs to score model changes.
Their value depends entirely on how well they represent production inputs.

Get the Snipd Podcast app to discover more snips from this episode

Get the app