
If You Can’t Test It, Don’t Deploy It: The New Rule of AI Development?
9 snips
Nov 3, 2025 Magdalena Picariello, an AI practitioner and academic, emphasizes making AI development more business-impactful. She discusses the need to shift from traditional metrics to evaluating real-world business outcomes. Magdalena talks about implementing iterative testing systems for generative AI and prioritizing high-value edge cases. She shares insights on a data-driven, test-first approach, the importance of human-crafted tests, and tools for effective evaluation. Lastly, she highlights translating business KPIs into code to ensure alignment with user needs.
AI Snips
Chapters
Transcript
Episode notes
GenAI Lacks Binary Ground Truth
- Generative AI often lacks ground truth, so outputs are judged by human preferences that vary.
- This creates a spectrum of correctness and makes pinpointing causes inside models hard compared to traditional software debugging.
Prompt Testing Saved A Failing Chatbot
- Magdalena recounts a chatbot project stuck at 60% accuracy that hallucinated and consumed time and budget.
- They broke the logjam by building a system to test prompts and iterate rapidly rather than hunting for a perfect prompt.
Build Tests From User Expectations
- Start with user expectations and translate them into automated test cases before choosing models or data.
- Build a coverage matrix that maps query types, user segments, and business importance to prioritize tests.
