Microsoft Research Podcast

AI Testing and Evaluation: Reflections

20 snips
Jul 21, 2025
Amanda Craig Deckard, Senior Director of Public Policy in Microsoft's Office of Responsible AI, shares insights on AI testing and evaluation. She discusses the need for effective governance, highlighting lessons learned about pre-deployment and post-deployment testing. The conversation emphasizes the importance of rigor, standardization, and interpretability in AI evaluation. Deckard also explores the sociotechnical impacts of AI and the necessity of collaborative efforts across sectors to ensure responsible AI development, drawing parallels from fields like cybersecurity.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Testing Builds Trust but is Complex

  • Testing is critical to build trust but is complex and multi-stage across AI development.
  • It balances addressing risks, enabling innovation, and adapts to different industry sizes and contexts.
INSIGHT

Testing Regimes Vary by Domain

  • AI testing regimes vary between rigid pre-deployment to adaptive post-deployment monitoring.
  • The domain context and technology type influence whether testing is standardized or flexible.
ANECDOTE

Pharma vs Cybersecurity Testing Stories

  • Pharma emphasizes pre-market testing with limited post-market follow-up due to resource constraints.
  • Cybersecurity evolves through norms like coordinated vulnerability disclosure and bug bounties, focusing on post-deployment risks.
Get the Snipd Podcast app to discover more snips from this episode
Get the app