Delta: HealthTech Innovators

AI in Medicine is BROKEN: Stanford PhD Exposes the 95% Accuracy Lie | LLMs in Healthcare

Oct 6, 2025
A Stanford PhD researcher reveals alarming truths about AI in medicine, highlighting the stark contrast between claimed accuracy and real-world performance. She discusses how models can dramatically fail when faced with questions that have no definitive answer. The conversation emphasizes the need for more realistic evaluations of AI tools and advises clinicians to approach AI as a supportive co-pilot rather than a replacement. With cautionary insights on model performance and patient safety, this discussion reshapes our understanding of AI deployment in healthcare.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Benchmarks Don't Mirror Real Clinical Data

  • High-score medical benchmarks like MedQA do not reflect messy real EHR data and workflows.
  • Real clinical evaluation needs messy notes, longitudinal context, and distributed information across records.
INSIGHT

MCQs Hide Open-Ended Clinical Reasoning

  • Multiple-choice tests hide the open-ended reality of clinical decision-making where providers generate differential diagnoses.
  • Evaluations must test open-ended tasks that produce variable answers, not fixed A/B/C choices.
ANECDOTE

USMLE Scores Didn't Prepare For Residency

  • Roupen described how high USMLE scores didn't prepare him for residency's messy clinical work.
  • He emphasized shared decision-making in oncology where multiple treatments may all be valid depending on patient preference.
Get the Snipd Podcast app to discover more snips from this episode
Get the app