
AI in Medicine is BROKEN: Stanford PhD Exposes the 95% Accuracy Lie | LLMs in Healthcare
Delta: HealthTech Innovators
Outro
Hosts close the episode with thanks and final farewells, ending the conversation.
Is AI really ready to replace doctors? Stanford PhD researcher Suana reveals shocking truths about medical AI that Big Tech doesn't want you to know. When she tested leading AI models like GPT-4, Claude, and DeepSeek on modified medical questions, their accuracy plummeted by up to 40%!In this eye-opening conversation, we dive deep into:
❌ Why 95%+ accuracy on medical exams means nothing in real clinical practice
❌ How AI models fail when there's "no right answer" (which happens constantly in medicine)
❌ The dangerous gap between flashy headlines and clinical reality
✅ How doctors can safely use AI as a co-pilot (not replacement)
✅ The future of medical AI evaluation and what needs to changeSuana is a 3rd-year PhD student at Stanford in Biomedical Data Science, pioneering real-world evaluation methods for medical AI. Her research on MedELM and benchmarking is reshaping how we think about AI deployment in healthcare.🔬
Key Research Discussed:
JAMA Open publication on AI robustness in medical diagnosis
MedELM: 35-dataset benchmark suite for real clinical tasks
Why MedQA and USMLE-style tests don't reflect actual patient care
⚠️ CRITICAL TAKEAWAY: AI models are trained to always give an answer, even when "none of the above" is correct—a potentially dangerous flaw in medical decision-making.📚 Resources Mentioned:
MedELM Leaderboard (public repository available)
Research on medical AI evaluation standards
Real-world hospital deployment considerations
Timestamps:
0:00 - Introduction: Why Medical AI Evaluation is Broken
1:04 - Suana's Journey: From Computer Science to Healthcare AI
2:32 - The 3 Critical Problems with Current AI Benchmarks
8:28 - The Research: Testing AI with "None of the Above"
17:24 - Shocking Results: AI Accuracy Drops 8-40%
19:02 - Why AI Can't Say "I Don't Know"
23:10 - Take-Home Message: Use AI as Co-Pilot, Not Replacement
24:58 - Real Clinical Examples: When AI Actually Helps
28:12 - MedELM: The Future of Medical AI Evaluation
34:35 - Final Advice for Doctors, Patients & Developers
Whether you're a physician, healthcare worker, AI developer, or patient curious about medical AI, this conversation will change how you think about artificial intelligence in healthcare.
Paper link: https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2837372


