

AI Testing and Evaluation: Reflections
20 snips Jul 21, 2025
Amanda Craig Deckard, Senior Director of Public Policy in Microsoft's Office of Responsible AI, shares insights on AI testing and evaluation. She discusses the need for effective governance, highlighting lessons learned about pre-deployment and post-deployment testing. The conversation emphasizes the importance of rigor, standardization, and interpretability in AI evaluation. Deckard also explores the sociotechnical impacts of AI and the necessity of collaborative efforts across sectors to ensure responsible AI development, drawing parallels from fields like cybersecurity.
AI Snips
Chapters
Transcript
Episode notes
Testing Builds Trust but is Complex
- Testing is critical to build trust but is complex and multi-stage across AI development.
- It balances addressing risks, enabling innovation, and adapts to different industry sizes and contexts.
Testing Regimes Vary by Domain
- AI testing regimes vary between rigid pre-deployment to adaptive post-deployment monitoring.
- The domain context and technology type influence whether testing is standardized or flexible.
Pharma vs Cybersecurity Testing Stories
- Pharma emphasizes pre-market testing with limited post-market follow-up due to resource constraints.
- Cybersecurity evolves through norms like coordinated vulnerability disclosure and bug bounties, focusing on post-deployment risks.