The InfoQ Podcast

If You Can’t Test It, Don’t Deploy It: The New Rule of AI Development?

9 snips
Nov 3, 2025
Magdalena Picariello, an AI practitioner and academic, emphasizes making AI development more business-impactful. She discusses the need to shift from traditional metrics to evaluating real-world business outcomes. Magdalena talks about implementing iterative testing systems for generative AI and prioritizing high-value edge cases. She shares insights on a data-driven, test-first approach, the importance of human-crafted tests, and tools for effective evaluation. Lastly, she highlights translating business KPIs into code to ensure alignment with user needs.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

GenAI Lacks Binary Ground Truth

  • Generative AI often lacks ground truth, so outputs are judged by human preferences that vary.
  • This creates a spectrum of correctness and makes pinpointing causes inside models hard compared to traditional software debugging.
ANECDOTE

Prompt Testing Saved A Failing Chatbot

  • Magdalena recounts a chatbot project stuck at 60% accuracy that hallucinated and consumed time and budget.
  • They broke the logjam by building a system to test prompts and iterate rapidly rather than hunting for a perfect prompt.
ADVICE

Build Tests From User Expectations

  • Start with user expectations and translate them into automated test cases before choosing models or data.
  • Build a coverage matrix that maps query types, user segments, and business importance to prioritize tests.
Get the Snipd Podcast app to discover more snips from this episode
Get the app