AI Testing and Evaluation: Reflections

43 snips

Jul 21, 2025

Amanda Craig Deckard, Senior Director of Public Policy in Microsoft's Office of Responsible AI, shares insights on AI testing and evaluation. She discusses the need for effective governance, highlighting lessons learned about pre-deployment and post-deployment testing. The conversation emphasizes the importance of rigor, standardization, and interpretability in AI evaluation. Deckard also explores the sociotechnical impacts of AI and the necessity of collaborative efforts across sectors to ensure responsible AI development, drawing parallels from fields like cybersecurity.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Testing Builds Trust but is Complex

Testing is critical to build trust but is complex and multi-stage across AI development.
It balances addressing risks, enabling innovation, and adapts to different industry sizes and contexts.

INSIGHT

Testing Regimes Vary by Domain

AI testing regimes vary between rigid pre-deployment to adaptive post-deployment monitoring.
The domain context and technology type influence whether testing is standardized or flexible.

ANECDOTE

Pharma vs Cybersecurity Testing Stories

Pharma emphasizes pre-market testing with limited post-market follow-up due to resource constraints.
Cybersecurity evolves through norms like coordinated vulnerability disclosure and bug bounties, focusing on post-deployment risks.

Get the Snipd Podcast app to discover more snips from this episode

Get the app