
The Skeptics Guide to Emergency Medicine SGEM#465: Not A Second Time – Single Center RCTs Fail To Replicate In Multi-Center RCTs
Jan 11, 2025
In this discussion, Dr. Scott Weingart, an ED Intensivist from New York with a rich background in Trauma and Critical Care, dives into the reliability of clinical trials. He highlights the challenges of replicating single-center randomized trials in larger, multi-center settings, pointing out significant discrepancies in outcomes. The conversation also touches on the importance of methodology in trial design and the real-world applicability of results, encouraging ongoing training and clinical judgment in emergency medicine.
35:03
1 chevron_right 2 chevron_right 3 chevron_right 4 chevron_right 5 chevron_right 6 chevron_right
Intro
00:00 • 2min
Exploring Eudaimonia and Conference Insights
01:43 • 2min
Anticipation for a Unifying Medical Conference
03:28 • 2min
Validity Challenges in Clinical Trials
05:22 • 16min
Challenges of RCT Replication
21:04 • 6min
Challenges in RCT Validity
27:10 • 8min
Date: December 2o, 2024
Reference: Kotani et al. Positive single-center randomized trials and subsequent multicenter randomized trials in critically ill patients: a systematic review. Crit Care. 2023
Guest Skeptic: Dr. Scott Weingart is an ED Intensivist from New York. He did fellowships in Trauma, Surgical Critical Care, and ECMO. He is a physician coach concentrating on the promotion of eudaimonia and optimal performance. Scott is best known for talking to himself about Resuscitation and Critical Care on a podcast called EMCrit, which has been downloaded more than 50 million times.
Case: A 40-year-old male presents to the emergency department (ED) with severe respiratory failure from bilateral pneumonia. After a trial of Non-Invasive Positive Pressure Ventilation (NIPPV), you’ve decided to intubate him. Should your first pass attempt be done with a bougie or a styletted tube?
Randomization
Background: The role of single-center randomized controlled trials (sRCTs) in advancing medical knowledge is significant, especially in the field of emergency medicine (EM). These trials often serve as the initial foundation for exploring interventions, providing a focused and controlled environment to test hypotheses.
However, the applicability of their findings to broader clinical settings can be limited due to their localized context. Multi-center randomized controlled trials (mRCTs) are often seen as a necessary step to validate these findings across diverse patient populations and healthcare settings. This process of validation is critical, as it addresses external validity—a cornerstone of evidence-based practice.
Historically, the need to move from sRCTs to mRCTs arises from the recognition that different institutions have varied patient demographics, resources, and protocols that might influence outcomes. While sRCTs provide essential insights, their ability to reflect real-world complexities is inherently restricted. Emergency physicians, who operate in unpredictable environments, often rely on evidence that is robust across multiple settings to guide clinical decisions effectively.
Despite the apparent hierarchical superiority of mRCTs, there are debates about whether they consistently confirm the results of sRCTs. This discussion is pivotal in understanding how findings can be generalized and integrated into clinical guidelines. As emergency physicians, evaluating the interplay between sRCTs and mRCTs not only helps in assessing the reliability of evidence but also in shaping the way we approach the implementation of interventions in our practice.
Clinical Question: How often are single-centred RCTs of critically ill patients reporting a mortality benefit confirmed in a multi-centred RCT?
Reference: Kotani et al. Positive single-center randomized trials and subsequent multicenter randomized trials in critically ill patients: a systematic review. Crit Care. 2023
Population: sRCTs published in high-impact journals (NEJM, JAMA, or Lancet) that reported statistically significant mortality reductions in critically ill patients.
Exclusions: Quasi-randomized or non-randomized methodologies, multicentric trials, pediatric populations, and studies lacking mortality data.
Intervention: sRCTs
Comparison: mRCTs
Outcome:
Primary Outcome: Mortality assessed at specified time points such as hospital discharge or predefined follow-up periods.
Secondary Outcomes: Guideline utilization of sRCT results, subsequent guideline changes based on mRCT
Type of Study: Systematic review that followed the PRISMA guidelines and was registered in the PROSPERO International Prospective Register of Systematic Reviews
Authors’ Conclusions: “Mortality reduction shown by sRCTs is typically not replicated by mRCTs. The findings of sRCTs should be considered hypothesis-generating and should not contribute to guidelines.”
Quality Checklist for Systematic Reviews:
Was the main question being addressed clearly stated? Yes
Was search for studies was detailed and exhaustive? Yes
Were the criteria used to select articles for inclusion appropriate? Yes
Were the included studies sufficiently valid for the type of question asked? Yes
Were the results similar from study to study? Unsure
Were there any financial conflicts of Interest? No
Who funded the study? The review was funded by academic institutions.
Results: The review included 19 sRCTs and 24 subsequent mRCTs. Sixteen sRCTs addressed were followed up by mRCTs. The majority of mRCTs found no mortality difference compared to the significant findings of sRCTs.
Key Result: Single-centred RCTs often do not replicate in multi-centred RCTs.
Primary Outcome: Only one out of 16 (6%) sRCT’s findings were confirmed by mRCTs.
Secondary Outcomes: 14 sRCTs were referenced at least once in international guidelines. Of those, 43% (6/14) have since been either suggested against or removed in the most recent versions of the guidelines.
1) PRISMA: Kotani et al. adhered to several essential PRISMA checklist items but did fall short on key areas such as providing the full search strategy, reporting bias, certainty assessment, and detailed risk of bias assessment. The study does not fully satisfy the PRISMA 2020 quality criteria.[1] (see attached table).
2) Publication Bias: This occurs because the likelihood of research results being published is influenced by the nature and direction of the findings. Studies with “positive”, statistically significant, or novel results are more likely to be published, while those with “negative” or inconclusive outcomes often remain unpublished or delayed.
It is possible to quantify publication bias. A systematic review found that studies reporting significant outcomes were more likely to be published than those without, with a pooled odds ratio of 2.8 (95% CI: 2.2 to 3.5).[2] This indicates that studies with significant results had 2.8 times higher odds of being published compared to studies with non-significant results.
This imbalance can skew the body of available evidence, leading to overestimation of intervention effects, misrepresentation of true outcomes, and flawed decision-making in clinical practice, policy development, or future research. We should try and move away from thinking of studies as positive or negative. If you have asked a good question and used appropriate methods, then it does not matter if the results are positive or negative. Science has moved forward, and these results should be part of the medical literature to minimize publication bias.
3) Heterogeneity in Study Populations: Variability in patient demographics, settings, and interventions in mRCTs vs. sRCTs might contribute to conflicting results.
4) sRCTs: Single-center studies often have unique settings or expertise that may not be generalizable to multicenter trials. They also often have smaller sample sizes, increasing the risk of Type I errors compared to larger mRCTs. Here are some examples to discuss this topic area:
LEUVEN Trial - van den Berghe et al. Intensive insulin therapy in critically ill patients. N Engl J Med. 2001 Nov (Type I Error?)
Early Goal Direct Therapy - Rivers et al Early goal-directed therapy in the treatment of severe sepsis and septic shock. N Engl J Med. 2001 Nov (Hidden Confounders & Control Group Evolution?) SGEM#69 and SGEM#92
PREOXI (note this was mRCT) - Gibbs et al. Noninvasive Ventilation for Preoxygenation during Emergency Intubation. N Engl J Med. 2024 Jun (Hidden Confounder?) SGEM#447
BEAM - Driver et al. Effect of Use of a Bougie vs Endotracheal Tube and Stylet on First-Attempt Intubation Success Among Patients With Difficult Airways Undergoing Emergency Intubation: A Randomized Clinical Trial. JAMA. 2018 Jun (Different Skil Sets?) SGEM#271
5) Guidelines: They are to guide care and not to be considered or used as GODlines. Research indicates that the validity of guideline recommendations diminishes over time. A study published in the Canadian Medical Association Journal(CMAJ) analyzed the lifespan of clinical guideline recommendations and found that approximately 90% remained valid after one year.[3] However, this validity decreased to about 81% after three years and 78% after four years.
This data suggests that a significant proportion of recommendations may become outdated within a few years of publication. In the study, we are reviewing today, of the 14 sRCTs referenced at least once in international guidelines, six (43%) have since been either removed or suggested against in the most recent versions of relevant guidelines.
This informs my position that we should be skeptical of the push to blindly follow guidelines when we are pressured by organizations like the American Heart Association to “get with the guidelines”.[4] The recommendations are often not based on high-quality evidence.[5] How closely should we adhere to a specific recommendation (25%, 50%, 75% or 100%)? We know that in the EBM framework, the literature is only one of three pillars. We still need to use our clinical judgement and ask the patient about their preferences and values.
Comment on Authors’ Conclusion Compared to SGEM Conclusion: We generally agree with the authors’ conclusions that the results of sRCTs should be considered hypothesis-generating and should not contribute to clinical practice guidelines.
SGEM Bottom Line: Be skeptical of accepting the conclusions of sRCT unless you can precisely duplicate the conditions that led to positive sRCT results.
Case Resolution: You choose to use the bougie on your first pass because you have trained extensively with the device and believe you are more akin to the BEAM trial clinicians than the bougie trial clinicians.

