If You Can’t Test It, Don’t Deploy It: The New Rule of AI Development?

9 snips

Nov 3, 2025

Magdalena Picariello, an AI practitioner and academic, emphasizes making AI development more business-impactful. She discusses the need to shift from traditional metrics to evaluating real-world business outcomes. Magdalena talks about implementing iterative testing systems for generative AI and prioritizing high-value edge cases. She shares insights on a data-driven, test-first approach, the importance of human-crafted tests, and tools for effective evaluation. Lastly, she highlights translating business KPIs into code to ensure alignment with user needs.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

GenAI Lacks Binary Ground Truth

Generative AI often lacks ground truth, so outputs are judged by human preferences that vary.
This creates a spectrum of correctness and makes pinpointing causes inside models hard compared to traditional software debugging.

ANECDOTE

Prompt Testing Saved A Failing Chatbot

Magdalena recounts a chatbot project stuck at 60% accuracy that hallucinated and consumed time and budget.
They broke the logjam by building a system to test prompts and iterate rapidly rather than hunting for a perfect prompt.

ADVICE

Build Tests From User Expectations

Start with user expectations and translate them into automated test cases before choosing models or data.
Build a coverage matrix that maps query types, user segments, and business importance to prioritize tests.

Get the Snipd Podcast app to discover more snips from this episode

Get the app