

The Skeptics Guide to Emergency Medicine
Dr. Ken Milne
Meet ’em, greet ’em, treat ’em and street ’em
Episodes
Mentioned books

Feb 15, 2020 • 20min
SGEM#284: Might as Well Jump, but We would Recommend a Parachute
Date: February 11th, 2020
Reference: Yeh et al. Parachute use to prevent death and major trauma when jumping from aircraft: randomized controlled trial. BMJ 2018.
Guest Skeptic: Marcus Prescott is a nurse in Norway. He is also now a third-year medical student.
Case: A 32-year-old woman with no previous medical history calls you while a passenger on a crashing plane. She has been offered a parachute by the flight attendant but is unsure whether jumping from the plane is wise. You quickly scour the literature for evidence to inform her decision.
Background: The parachute– an umbrella term for devices to slow the motion of an object through an atmosphere by creating drag – was first deployed in China roughly 4,000 years age. The modern versions reached widespread use with the invention of heavier than air flight early last century.
Different variants of parachutes have been used both for recreational and safety purposes; in either case aiming to avoid death in people falling from heights presumed to be lethal. Despite the near universal application, a systematic review from 2003 (Smith and Pell, BMJ) found no RCTs of parachute intervention.
That systematic review published in the BMJ is a classic paper and part of their annual holiday edition. It stated that there was observational data showing parachutes failed at times to prevent morbidity and mortality. There are also case reports of free falls that did not result in 100% mortality.
The authors suggested taking evidence-based medicine advocates up in a plane for a double blinded randomized control trial. The intervention would be a parachute and the control arm would be a sham parachute (backpack). To make it more rigorous, anyone who survived the first jump would cross over into the other arm of the study and jump again. Only then would we have definitive evidence that a parachute was effective in preventing death and major trauma related to gravitational challenges.
After years of trying to organize a trial, researchers were finally able to recruit some volunteers to jump out of a plane with a parachute or backpack.
Clinical Question: Do parachutes reduce death or major injury when jumping from aircraft?
Reference: Yeh et al. Parachute use to prevent death and major trauma when jumping from aircraft: randomized controlled trial. BMJ 2018.
Population: Adults 18 years of age and older, seated on aircraft and deemed rational decision makers.
Intervention: Jumping from aircraft with parachute
Comparison: Jumping from aircraft with backpack
Outcome:
Primary Outcome: Composite of death and major traumatic injury (ISS>15) within five minutes of impact or at 30 days.
Secondary Outcomes: Health status and subgroup analysis based on type of aircraft or previous parachute use.
Authors’ Conclusions: “Parachute use did not significantly reduce death or major injury when jumping from aircraft in the first randomized evaluation of this intervention. However, the trial was only able to enroll participants on small stationary aircraft on the ground, suggestion cautious extrapolation to high altitude jumps. When beliefs regarding the effectiveness of an intervention exists in the community, randomized trials might selectively enroll individuals with a lower perceived likelihood of benefit, thus diminishing the applicability of the results to clinical practice.”
Quality Checklist for Randomized Clinical Trials:
The study population included or focused on those in the emergency department. No
The patients were adequately randomized. Yes
The randomization process was concealed. Yes
The patients were analyzed in the groups to which they were randomized. Yes
The study patients were recruited consecutively (i.e. no selection bias). No
The patients in both groups were similar with respect to prognostic factors. Unsure
All participants (patients, clinicians, outcome assessors) were unaware of group allocation. No
All groups were treated equally except for the intervention. Yes
Follow-up was complete (i.e. at least 80% for both groups). Yes
All patient-important outcomes were considered. Yes
The treatment effect was large enough and precise enough to be clinically significant. No
Key Results: They screened 92 adults with only 23 agreeing to be in the trial. The median age was 38 years and 43% were female.
Parachutes did not reduce death or major injury
Primary Outcome:
Composite of death and major traumatic injury (ISS>15) within five minutes of impact was 0% vs. 0% with p>0.9
Composite of death and major traumatic injury (ISS>15) within 30 days was 0% vs. 0% with p>0.9
Secondary Outcomes:
No statistical difference in health status
No statistical differences when stratified by type of aircraft or previous parachute use.
Talk Nerdy: There were many limitations to this study including a composite outcome for the primary outcome. However, we will only discuss five things that threaten the validity and interpretation of this trial.
Convenience Sample: These were not consecutive adults sitting on an airplane. Participants were selected from those seated next to the recruiter. This could have introduced some selection bias into the study population. When we use the term “bias” we are not talking about random noise in the data but rather something that systematically moves us away from the true point estimate.
Lack of Blinding: Allocation to parachute or backpack was not concealed to the investigator who assigned the treatment. This too could have led to some selection bias. The groups were unbalanced with more frequent fliers in the control (backpack) group. This may or may not have impacted the results.
Ikea Bias: Most of the participants who were randomized were study investigators. They would be unblinded to the study hypothesis and could be more invested in the results because they helped design the study. Whether or not this would have a significant impact on the results is unclear.
Lack of Deployment: In the intervention arm none of the12 participants had their parachute open. This makes the trial very difficult to interpret. If the parachute did deploy properly would it have provided a benefit? However, none of the 12 participants died or were injured because the parachute did not open during the jump.
Fatal Flaw: There was a difference between participants and non-participants. Participants jumped from a mean altitude of 0.6m traveling at a velocity of 0km/hr. This is in comparison to the non-participants who were at a mean altitude of 9,000m and traveling at a velocity of 800km/hr.
Comment on Authors’ Conclusion Compared to SGEM Conclusion: We generally agree with the authors’ conclusions.
SGEM Bottom Line: Wear a parachute if jumping out of a moving aircraft in the air to prevent morbidity and mortality.
Case Resolution: Despite the lack of high-quality evidence demonstrating the efficacy of parachutes, you advise your friend to use the parachute being offered by the flight attendant.
Marcus Prescott
Clinical Application: Based on your understanding of physics and reality, you would recommend people use parachutes if jumping out of an aircraft that is flying. While it does not guarantee you will not be injured or die it is the best evidence we have on the topic. In addition, more research is not needed to determine if parachutes prevent morbidity or mortality due to gravitational challenges.
What Do I Tell the Passenger? Accept the parachute being provided by the flight attendant.
Keener Kontest: Last weeks’ winner was Jonathan Carter. He knew Kingston was the first capital of Canada.
Listen to the podcast to hear this weeks’ question. Send your answer to TheSGEM@gmail.com with “keener” in the subject line. The first correct answer will receive a cool skeptical prize.
Other FOAMed:
Hayes et al. Most medical practices are not parachutes: a citation analysis of practices felt by biomedical authors to be analogous to parachutes. CMAJ 2018
Potts and Grossman. Parachute approach to evidence based medicine. BMJ 2006
Mamas. What a Parachute Study Tells Us About RCTs. Medscape 2018
First10EM: Finally, an RCT of parachutes
Remember to be skeptical of anything you learn, even if you heard it on the Skeptics’ Guide to Emergency Medicine.

Feb 8, 2020 • 24min
SGEM#283: Can You Be Absolutely Right in Diagnosing a SAH Using a Clinical Decision Instrument?
Date: January 29th, 2020
Reference: Perry et al. Prospective Implementation of the Ottawa Subarachnoid Hemorrhage Rule and 6-Hour Computed Tomography Rule. Stroke 2019
Guest Skeptic: Dr. Rory Spiegel is an EM/CC doctor who splits his time in the Emergency Department and Critical Care department. He also has this amazing #FOAMed blog called EM Nerd.
Case: A 48-year-old male presents to your emergency department with a sudden onset headache, which started about one-hour prior to arrival. The headache is severe is quality and the patient does not have a history of similar headaches in the past. It is associated with nausea, vomiting and photophobia.
Background: Headaches are a common complaint presenting to the emergency department. Subarachnoid hemorrhage represents one of the most serious underlying causes of headaches and we have covered it a number of times on the SGEM:
SGEM#48: Thunderstruck – Subarachnoid Hemorrhage
SGEM#134: Listen, to what the British Doctors Say about LPs post CT for SAH
SGEM#140: CT Scans to Rule Out Subarachnoid Hemorrhages in A Non-Academic Setting
SGEM#201: It’s in the Way That You Use It – Ottawa SAH Tool
In patients who present neurologically intact making the diagnosis early is key to preventing subsequent more life-threatening bleeding. A number of controversies surround the diagnosis of SAH in the emergency department. Two of the more provocative are the use of the Ottawa SAH Rule and whether a lumbar puncture (LP) is required following a negative CT if the scan is performed within 6-hours of symptom onset.
The Ottawa SAH Rule (tool) was covered on SGEM#201. The bottom line from that study was that the clinical decision instrument needed external validation, a meaningful impact analysis performed and patient acceptability of incorporating this rule into a shared decision-making instrument before being widely adopted.
We were surprised that in their background/introduction material they did not include the excellent SRMA on this topic by Carpenter et al. AEM 2016.
Clinical Question: What is the clinical impact of the Ottawa SAH Rule and the 6-hour CT Rule compared to standard care when implemented in six emergency departments across Canada?
Reference: Perry et al. Prospective Implementation of the Ottawa Subarachnoid Hemorrhage Rule and 6-Hour Computed Tomography Rule. Stroke 2019
The senior author on this publication was the legend of emergency medicine, Dr. Ian Stiell from Ottawa.
Population: Neurologically intact adult presenting to the ED with a chief complaint of a nontraumatic, acute headache, or syncope associated with a headache.
Exclusions: Patients with any of the following:
3 or more previous similar headaches (ie, same intensity/character as their current headache) over a period of >6 months (eg, established migraines)
confirmed SAH before arrival at study ED
previously investigated with CT and LP for the same headache
papilledema
new focal neurological deficit
previous diagnosis of intracranial aneurysm or SAH
known brain neoplasm
cerebroventricular shunt
headache within 72 hours following a LP
headache described as gradual or peak intensity beyond 1 hour.
Intervention: Physicians were actively encouraged to use the Ottawa SAH Rule and the 6-hour-CT Ruleto determine when to undergoing diagnostic workups for SAH and when a CT alone with an appropriate workup. Clinicians had the option to override the proposed rules.
Comparison: The control phase was standard care. Clinicians were encouraged to not use any clinical decision instrument and make the decision to pursue diagnostic studies based on their own clinical discretion.
Outcome: The primary outcome was the clinical impact of the Ottawa SAH Rule and 6-hr CT Rule for making the diagnosis of a SAH compared to usual care. SAH was defined as:
Subarachnoid blood on CT
Xanthochromia in the cerebrospinal fluid
Red blood cells in the final tube of cerebrospinal fluid with an aneurysm demonstrated on cerebral angiography, CTA, or magnetic resonance imaging angiography.
Dr. Jeff Perry
Authors’ Conclusions: “This implementation study validates the accuracy of the Ottawa SAH rule and 6-hour-CT rule for SAH. Both the Ottawa SAH rule and the 6-hour-CT rule are now fully validated and ready to use clinically. Using the Ottawa SAH rule did not increase or decrease the number of investigations performed. The 6-hour-CT rule resulted in a modest decrease in testing following a normal early CT. Utilizing the Ottawa SAH rule and the 6-hour-CT rule allows clinicians in ED to safely standardize care for alert, patients with acute headache.”
Quality Checklist for A Diagnostic Study:
The clinical problem is well defined. Yes
The study population represents the target population that would normally be tested for the condition (ie no spectrum bias). Yes
The study population included or focused on those in the emergency department. Yes
The study patients were recruited consecutively (ie no selection bias). Yes
The diagnostic evaluation was sufficiently comprehensive and applied equally to all patients (ie no evidence of verification bias). No
All diagnostic criteria were explicit, valid and reproducible (ie no incorporation bias) No
The reference standard was appropriate (ie no imperfect gold-standard bias). No
All undiagnosed patients underwent sufficiently long and comprehensive follow-up (ie no double gold-standard bias). Unsure
The likelihood ratio(s) of the test(s) in question is presented or can be calculated from the information provided. Yes
The precision of the measure of diagnostic performance is satisfactory. Yes
Key Results: They had 3,672 patient that met inclusion criteria. There were 1,743 patients in the control phase of the study and 1,929 patients in the implementation phase of the study when. The mean age was 45 years and 60% were female. They identified 188 (5.1%) of patients had a SAH.
Ottawa SAH Rule:
Sensitivity 100% (95% CI 98.1% to 100%)
Specificity 12.7% (95% CI: 11.7% to 13.9%)
6hr CT Rule:
Sensitivity 95% (95% CI 89.8% to 98.5%)
Specificity 100% (95% CI: 99.7% to 100%)
1. Patient Population: This was a pretty wide group of patients which were considered for this study. A rule like Ottawa SAH Rule where the specificity is so low you would ideally like to apply it in a population at high risk for the disease state. So, in patients in whom I am already considering a workup for SAH and if the Ottawa SAH Rule is negative, I can stop the work up. This would be similar to the PERC rule. Applying the Ottawa SAH Rule in a more generalized group of patients may lead to an increase in downstream testing.
In contrast this may have helped the 6-hr CT Rule as not a lot of these patients (5%) ended up having a SAH. Now it did go up to 9% when only the subset of patients presenting within 6-hrs of symptom onset where included.
2. Gold Standard: The gold standard here is a bit complicated. Ideally what you would like is a measure the accurately diagnoses SAH and it would be preferable if the investigators used this same measure on all patients included in the study. But that is not always practical in real world studies. So, in this case you would ideally like if everyone received an LP and then some form of angiography to assess for aneurysm if the LP was positive. Obviously, it’s impractical and ethically questionable to perform an LP and angiography on all the patients in this study so the authors had to use different gold standards depending on what was found on the initial CT scan. This can lead to a number if forms of bias.
Incorporation bias occurs when results of the test under study are actually used to make the final diagnosis. This makes the test appear more powerful by falsely raising the sensitivity and specificity.
In this case, subarachnoid blood seen on the CT scan was included in the gold standard definition of SAH. Obviously, this will make the specificity of the CT scan appear really good and, in this case, it was 100%
Partial verification bias is a type of measurement bias in which the results of a diagnostic test affect whether the gold standardprocedure is used to verify the test result. This type of bias is also known as "work-up bias"or "referral bias”.
In this case, patients with a negative CT did not always undergo an LP. Since not all patients underwent the gold standard testing this can influence the diagnostic accurate of the test in question. In this case the 6-hr CT may appear more accurate than it is reality because if some SAH are missed on CT and having not undergone the LP there is the potential they will be counted as a true negative result.
3. Proxy Outcome Measure: In cases when a consistent gold standard cannot be used on all subjects a proxy measure can be used in its place. In this case the authors used the proxy outcome of alive and well at 6-months as a surrogate as not having an SAH. This seems like a reasonable surrogate. If you had a headache and did not receive any intervention for an aneurysm and did not have a SAH the likelihood that your initial headache was a herald bleed is minimal.
This is known as differential verification bias (double gold standard). This occurs when the test results influence the choice of the reference standard. So, a positive index test gets an immediate/gold standard test whereas the patients with a negative index test get clinical follow-up for disease. This can raise or lower sensitivity/specificity.
The question is what is an adequate definition of not having a SAH on 6-month follow up? The authors used a review the medical records of the hospital which they initially presented as well as every hospital with neurosurgical capacity in the same city as the index ED visit. Is this adequate follow up?

Jan 29, 2020 • 3min
SGEM#281ss: Balance of Prognostic Factors in Randomized Controlled Trials
Date: January 25th, 2020
SGEM#281: EM Docs Got an AmbuBag
Statistically Significant: Dan Lane
We want to make the SGEM even better and address some of the criticisms from the ClinEpi world about clinicians trying to do critical appraisal. In order to do that we now have a Dr. Dan Lane who has a PhD in Clinical Epidemiology. He will be commenting on each the SGEM episodes.
Dr. Dan Lane
On this episode of Statistically Significant we are going to discuss the importance of balance of prognostic factors in randomized controlled trials, using the PreVent trial as an example.
Characteristics that indicate when a patient more likely to have an outcome, what we call prognostic factors, need to be accounted for when assessing the effectiveness of a treatment. Without accounting for prognostic factors, the measures of treatment effect can be biased due to observed or unobserved factors amongst patients in each group. Consider if this same study had been conducted as a non-randomized design –clinicians may have decided to ventilate select patients between induction and intubation because they perceived them as more unstable prior to induction. These patients may also be at higher risk for hypoxia during this period for the same reasons the clinicians chose to ventilate them and therefore they would look worse when compared to patients not receiving ventilation if you did not account for these reasons – this is what epidemiologists call an indication bias.
The goal of randomization in clinical trials is to balance patient characteristics between the different groups being investigated in the study. By randomly assigning patients to groups, the sole indication for receiving the treatment is the randomization process. As long as there are enough patients randomized, all known and unknown prognostic factors will be mathematically balanced between the groups. Therefore when talking about the balance of prognostic factors as part of critical appraisal, the key point to realize is there are both known and unknown factors. Although in this study they found some statistical differences between measured prognostic factors at baseline, these are just the prognostic factors that happen to be reported by the investigators. If we trust their randomization process then we can assume that the overall risk of the primary outcome, which includes measured and unmeasured prognostic factors, is mathematically balanced between the groups.
One final point - the use of statistical hypothesis testing to compare prognostic factors is actually inappropriate here because by definition the null hypothesis that the two groups are the same is assumed to be true when the two groups are selected based on randomization. Therefore, any differences between the groups would be due to chance alone and considering them different would be a type 1 error.
Additional Reading:
Altman and Bland. Treatment allocation in controlled trials: why randomise? BMJ May 1999
Sander Greeland. Randomization, statistics, and causal inference. Epidemiology Nov 1990
Stephen Sean. Baseline Balance and Valid Statistical Analyses: Common Misunderstandings. Applied Clinical Trials. May 2005.
REMEMBER TO BE SKEPTICAL OF ANYTHING YOU LEARN, EVEN IF YOU HEARD IT ON THE SKEPTICS’ GUIDE TO EMERGENCY MEDICINE.

Jan 25, 2020 • 15min
SGEM#281: EM Docs Got an AmbuBag – The PreVent Trial
Date: January 9th, 2020
Reference: Casey et al. Bag-Mask Ventilation during Tracheal Intubation of Critically Ill Adults. NEJM February 2019
Guest Skeptic: Andrew Merelman is a critical care paramedic and second year medical student at Rocky Vista University in Colorado. His primary interests are resuscitation, critical care, airway management, and point-of-care ultrasound.
Case: A 60-year-old male is in your emergency department with sepsis from pneumonia. He has worsening work of breathing and a decreasing level of consciousness. You decide based on his clinical presentation that he needs to be intubated. Due to his already poor oxygenation, you are concerned about him desaturating during intubation and wonder if there is anything you can do to help prevent it.
Background: Emergency medicine is often referred to as the ABC (Airway, Breathing and Circulation) specialty. We have covered airway a few times on the SGEM:
SGEM#75: Video Killed Direct Laryngoscopy?
SGEM#96: Machine Head – NIPPV for Out of Hospital Respiratory Distress
SGEM#247:Supraglottic Airways Gonna Save You for an OHCA?
SGEM#249: Ace in the Hole – Confirming Endotracheal Tube Placement with POCUS
SGEM#271: Bougie Wonderland for First Pass Success
Rapid Sequence Intubation (RSI) has been a mainstay of emergency airway management for years. However, there are aspects of the procedure that have been debated, one of which is how best to oxygenate the patient during the apneic period while not increasing rates of aspiration.
Clinical Question: Is bag-mask ventilation (BMV) performed during the apneic period of RSI (defined as the time between administration of RSI medications and intubation) in critically ill adults safe and effective?
Reference: Casey et al. Bag-Mask Ventilation during Tracheal Intubation of Critically Ill Adults. NEJM February 2019
Population: Adults patients (older than 17 years of age) undergoing induction and tracheal intubation in the intensive care unit.
Exclusions: Patients who were pregnant, incarcerated, had immediate need for intubation or if the treating clinicians felt that ventilation was indicated or contraindicated between induction and laryngoscopy.
Intervention: Bag-mask ventilation (BMV) during the time between administration of sedation/paralysis and insertion of the laryngoscope into the mouth for intubation.
Comparison: Apnea with or without nasal cannula oxygen during the time between administration of sedation/paralysis and insertion of the laryngoscope into the mouth for intubation.
Outcome:
Primary Outcome: The lowest oxygen saturation observed during the interval between induction and two minutes after tracheal intubation.
Secondary Outcome: The incidence of severe hypoxemia (oxygen saturation of less than 80%).
Authors’ Conclusions: “Among critically ill adults undergoing tracheal intubation, patients receiving bag-mask ventilation had higher oxygen saturations and a lower incidence of severe hypoxemia than those receiving no ventilation.”
Quality Checklist for Randomized Clinical Trials:
The study population included or focused on those in the emergency department. No
The patients were adequately randomized. Yes
The randomization process was concealed. Yes
The patients were analyzed in the groups to which they were randomized. Yes
The study patients were recruited consecutively (i.e. no selection bias). Unsure
The patients in both groups were similar with respect to prognostic factors. No
All participants (patients, clinicians, outcome assessors) were unaware of group allocation. No
All groups were treated equally except for the intervention. No
Follow-up was complete (i.e. at least 80% for both groups). Yes
All patient-important outcomes were considered. No
The treatment effect was large enough and precise enough to be clinically significant. Unsure
Key Results: They screened 667 patients and enrolled 401. The median age was 60 years, 56% were male and half the patients had sepsis or septic shock.
Bag-mask ventilation group had higher oxygen saturations and less severe hypoxemia compared to the control group.
Primary Outcome: Lowest oxygen saturation
96% (interquartile range, 87% to 99%) in the BMV group vs. 93% (interquartile range, 81% to 99%) in the no-ventilation group (P = 0.01).
Secondary Outcome:
21 patients (11%) in the BMV group had severe hypoxemia vs. 45 patients (23%) in the no-ventilation group (relative risk, 0.48; 95% CI: 0.30 to 0.77).
1. Patients: Patients in this study were recruited from seven academic intensive care units (ICUs) in the United States. Eighty percent of the patients were intubated for respiratory failure. While many adult patients in the emergency department are intubated for the same reason many others are intubated of cardiac arrest and trauma depending on your place of practice. It is unclear if this study population has external validity outside the ICU and to the emergency department.
Another thing about the patients who were excluded. The study did not enroll those patients judged to be a very high risk of desaturation or aspiration, had hypoxemia, or had acidemia. These patients are ones that we potentially care more about when it comes to peri-intubation oxygenation and ventilation, so it is difficult to say if these results are generalizable to this population.
2. Consecutive Patients: They claim that patients were recruited consecutively. However, selection bias could have been introduced. Patients could be excluded if they required immediate intub ation or if the treating clinicians felt that ventilation was indicated or contraindicated between induction and laryngoscopy.
This is pragmatic but it does introduce subjectivity into the process and could have resulted in bias. It is unclear if this would have any meaningful impact on the results.
3. Prognostic Factors: A quality indicator for an RCT is that both the intervention group and control group are similar with regards to prognostic factors. There were statistical differences between the two groups with 10% more patients having pneumonia and 6% less having a gastrointestinal bleeding in the control group.
4. Treated Equally: Another quality indicator is that both groups are treated equally except for the intervention. That was not the case in this trial. The BMV group was more likely to be preoxygenated with a BMV (40% vs 11%) while the no ventilation group was more likely to be preoxygenated with NiPPV (24% vs 16%). Preoxygenation can have an impact on likelihood of desaturation during intubation.
Note: The BMV ventilation in this trial was extremely well done. The providers in the trial were trained to provide appropriate rates, volumes, and adequate mask seal. This is not typical in most emergency departments.
5. DOOs, MOO and POO: Their primary and secondary outcomes were disease-oriented outcomes (DOOs) or monitor-oriented outcomes (MOOs). The median lowest oxygen saturation and incidence of severe hypoxia are surrogate markers and do not represent a patient-oriented outcome (POO).
They did look at a number of exploratory-oriented outcomes (EOO) for safety (ex. aspiration, new opacity on chest x-ray and cardiac arrest) and efficacy (ex. mortality, days in ICU and ventilator-free days). However, they did not include what could be considered the most important POO, survival with good neurologic outcome.
Comment on Authors’ Conclusion Compared to SGEM Conclusion: We generally agree with the authors’ conclusions but would also add that a statistical difference in a DOO does not necessarily translate into a clinically important POO.
SGEM Bottom Line: It is unclear if bag-mask ventilation in critically ill adult patients requiring intubation provides a clinically important benefit or is safe.
Case Resolution: Because the patient is at high risk of desaturation during intubation, you make a plan that optimizes preoxygenation. You use your clinical judgment and provide gentle, controlled bag-mask ventilation during the apneic period to prevent desaturation.
Clinical Application: Due to the multiple limitations identified in this trial it is difficult to know how to clinically apply this data.This is a common problem faced by clinicians practicing evidence-based medicine. The literature informs and guides our care but should not dictate our care. When we do not have definitive literature for efficacy or safety we must rely more upon our clinical judgement. In addition, we do not know if BMV will result in a clinically important outcome (survival with good neurologic outcome). This does not mean we should not perform very good preoxygenation prior to intubation.
What Do I Tell My Patient? You have pneumonia and it is making it difficult for you to breath. We can help by putting a tube in your throat. This will make it easier to breath and give time for the antibiotics to work. This can be scary. Before we would put the tube down your throat you would get some extra oxygen. Then, if you say OK to the tube, you will get some medicine to relax you and so you will not remember the experience. We will do everything possible to make sure this is successful and there are no complications.
Keener Kontest: There was no winner last week. The correct answer is Michigan is a Native American word meaning Great Water.
Listen to the podcast this week. If you know the answer to the trivia question then send me an email to TheSGEM@gmail.com with “keener” in the subject line. The first correct answer will receive a cool skeptical prize.
Other FOAMed:
First10EM: PreVent Trial
EM Nerd: The Case of the Conspicuous Conclusion
REBEL EM: PreVent BMV Prior to Intubation
The Resus Room: Managing the Apneic Period - The PreVent Trial
St. Emlyn's: Ventilation During RS

Jan 22, 2020 • 10min
SGEM Xtra: It’s All About the Bayes, ‘Bout the Bayes, No Fisher
Guest Skeptic: Dr. Dan Lane has a Masters in Health Services Research at the University of Calgary, a Doctor of Philosophy in Clinical Epidemiology from the University of Toronto and is currently a medical student at the University of Calgary.
Dan is naturally a contrarian, he strives to understand first principles of conventions in medical research in order to identify and challenge poor practices that have become dogma. He is passionate about statistics and epidemiology and wants to share that passion by making these topics more practical and approachable for clinicians. Believing the key to proper interpretation of medical research does not begin with memorizing some arbitrary threshold for statistical significance, Dan hopes to contribute to the SGEM through sharing an understanding of what story the numbers are actually telling about the data. Dan has no funding whatsoever, and no associations with industry. He is currently a medical student at the University of Calgary.
Dan has some pet peeves when it comes to statistics there used and critical appraisals. We will do some more in depth SGEM Xtras on each of these issues.
Thomas Bayes
Absolute vs. Relative Estimates
Effect Estimates and Not P-Values
All Models are Wrong
Predication vs. Classification
Bayes No Frequentists
The purpose of this SGEM Xtra, beside to introduce a new SGEM faculty member, is also to announce we are adding a new segment to the SGEM. It is going to be called Statistically Significant.
We want to make the SGEM even better and address some of the criticisms from the ClinEpi world about clinicians trying to do critical appraisal. In order to do that we now have a Dr. Dan Lane PhD who will be commenting on each the SGEM episodes.
The first instalment of Statistically Significant segment will be on this weeks’ SGEMHOP looking at troponin testing in the elderly patients presenting with non-specific complaints (SGEM#280). Let me know what you think of this idea. We have a few more lined up and feedback is always appreciated. Send me an email TheSGEM@gmail.com
Statistically Significant #280: Sensitivity and Specificity
Despite their dogmatic use in the literature, sensitivity and specificity have a number of limitations that are rarely considered or addressed in diagnostic test studies.
Sensitivity and Specificity are crude metrics, meaning they only look at the effect of a single measure and a single outcome. As crude measures they fail to incorporate any other information into their estimates, including potential confounders for the relationship between the test result and the outcome.
In this particular study, age is part of the primary objective for the study (geriatric patients) but is also a confounder of the relationship between troponin level (which may increase with age) and acute coronary syndrome risk (also increases with age). When confounders like age are present, crude measures will be influenced based on the prevalence of confounders in each the groups – for example, if there were more older patients in the troponin positive group, the estimates for sensitivity may be inflated.
Another limitation of sensitivity and specificity is they require a test result be classified as positive or negative. This is problematic when the real measure is a continuous measure, such as troponin. In the current study the test was considered “positive” if the troponin level was above the 99th percentiles for that enzyme. But this arbitrarily treats patients above or below the 99th percentile as homogeneous groups, meaning the statistics consider everyone above the threshold to be the same, and everyone below the threshold to be the same.
Consider a patient with a troponin right below the threshold and another patient right above the threshold – surely these patients are almost identical in terms of their risk for having ACS. But by inserting an arbitrary break into the measure, the statistics will treat them as different resulting in more misclassifications simply because a threshold for positive or negative was selected.
Instead of these binary classifications, researchers could focus directly on the patient’s risk of the outcome. This can be represented using probabilities and a smooth curve that shows the probability of ACS based on the exact troponin value. Using simple statistical models, these probability estimates can be adjusted for confounders, like age, and provide easily interpretable probability estimates for the entire range of troponins – no classification required!
References:
Amrhein, Greenland and McShare. Scientists rise up against statistical significance. Nature 2019
Reginal Nuzzo. STATISTICAL ERRORS. P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume. Nature 2014
Fatovich and Phillips. The probability of probability and research truths. AEM 2017
Greenland et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. EJE 2016
Guggenmoos-Holzmann and van Houwelingen. The (In)Validity of sensitivity and specificity. Statistics in Medicine 2000
Remember to be skeptical of anything you learn, even if you heard it on the Skeptics' Guide to Emergency Medicine.

Jan 18, 2020 • 30min
SGEM#280: This Old Heart of Mine and Troponin Testing
Date: January 16th, 2020
Reference: Troponin Testing and Coronary Syndrome in Geriatric Patients With Nonspecific Complaints: Are We Overtesting? AEM January 2020
Guest Skeptics:
Dr. James VandenBerg: James has a master’s degree in clinical investigation from Washington University in St. Louis, and is currently the Chief Resident at Detroit Receiving Hospital.
Dr. Andrew Huang: Andy is the Chief Resident at Sinai-Grace Hospital.
Case: As the resident, you have just finished seeing a 78-year-old male who has been brought in by his family over the holidays. The triage nurse has put the reason for the visit as “multiple complaints”. Despite spending 30 minutes in the room, you still are not sure exactly why the patient is here.
Your attending says that if you take a good geriatric history that you can always determine what’s going on. However, 15 minutes later your attending leaves the room defeated. The patient’s complaints are just so nonspecific.
The attending ends up ordering the “geriatrogram” – ticking off every blood test on the form, including the troponin. You turn to the attending and ask, “do you really think this could be acute coronary syndrome (ACS)?”
Background: Patients 65 years and older account for about 15% of emergency department visits in the United States. Their presentations are often complicated as they present with nonspecific symptoms, and there is often obscuring co-morbid conditions, polypharmacy, and cognitive/functional impairment.
Nonspecific symptoms in the elderly usually yield a broad differential and there are no recommended diagnostic algorithms, leading to extensive testing. ACS is usually amongst this differential, as cardiovascular disease is a leading cause of morbidity and mortality in this population.
Additionally, the elderly population with ACS more commonly presents without chest pain compared to younger patients (up to 20% of elderly patients with MI present with “weakness” as part of their chief complaint). While cardiovascular disease is the leading cause of mortality and morbidity in the elderly, the frequency of ACS amongst this population presenting with nonspecific symptoms is unknown.
Clinical Question: What is the frequency of ACS in elderly patients presenting to the ED with nonspecific complaints, and what is the utility of troponin testing in this population?
Reference: Wang et al. Troponin Testing and Coronary Syndrome in Geriatric Patients With Nonspecific Complaints: Are We Overtesting? AEM January 2020
Population: Patients aged 65 years and older presenting to the emergency department with nonspecific chief complaints who underwent troponin testing. “Nonspecific” was designed a priori as including weak or weakness, dizzy or dizziness, fatigue, lethargy, altered mental status, light-headedness, medical problem, examination requested, failure to thrive, or “multiple complaints.”
Exclusions: If they had a focal chief complaint (ex. focal pain, injury complaint, shortness of breath, vomiting, diaphoresis, syncope, fever, cough, focal neurologic deficit)or fever of at least 38C at triage.
Investigation: Troponin testing
Comparison: None
Outcomes: There were multiple outcomes of interest:
The proportion of patients with nonspecific complaints who underwent troponin testing.
The proportion of such patients who had elevated troponin.
The proportion of patients with ACS at the index visit or within 30 days.
The utility of troponin testing to diagnose or exclude ACS.
The frequency of other causes of troponin elevation in this population.
Dr. Alfred Wang
This is a LIVE episode of an SGEMHOP which means we have the lead author on the show. Dr. Alfred Wang is an emergency medicine physician at Indiana University in Indianapolis, IN. With the help from a dedicated team of physician-peers and mentor, Dr. Wang was able to complete this research project.
Authors’ Conclusions: “While consideration for ACS is prudent in selected elderly patients with nonspecific complaints, ACS was rare and no patients received reperfusion therapy. Given the false-positive rate in our study, our results may not support routine troponin testing for ACS in this population.”
Quality Checklist for A Chart Review: There is a quality check list for ED studies that was published by Gilbert et al in Annals of EM 1996. It had eight items. The list was updated and expanded by Dr. Andrew Worster from BEEM to include 12 items.
The authors of this retrospective chart review did a great job and 11 out of 12 answers were yes. The only “no” was that they did not have a management plan described for missing data in the publication.
Abstract Training: Were the abstractors trained before the data collection? Yes
Case Selection Criteria: Were the inclusion and exclusion criteria for case selection defined? Yes
Variable Definition: Were the variables defined? Yes
Abstraction Forms: Did the abstractors use data abstraction forms? Yes
Performance Monitored: Was the abstractors’ performance monitored? Yes
Binding to Hypothesis: Were the abstractors aware of the hypothesis/study objectives? Yes
Inter Rater Reliability (IRR) Mentioned: Was the interobserver reliability discussed? Yes
IRR Tested: Was the interobserver reliability tested or measured? Yes
Medical Record Identified: Was the medical record database identified or described? Yes
Sampling Method:Was the method of sampling described? Yes
Missing Data Management Plan: Was the statistical management of missing data described? No
Institutional Review Board Approved: Was the study approved by the institutional or ethics review board? Yes
A chart review is a type of observational study. We do have an SGEM quality check list for observational studies.
Quality Checklist for Observational Study:
Did the study address a clearly focused issue? Yes
Did the authors use an appropriate method to answer their question? Yes
Was the cohort recruited in an acceptable way? Yes
Was the exposure accurately measured to minimize bias? Unsure
Was the outcome accurately measured to minimize bias? Yes
Have the authors identified all-important confounding factors? Unsure
Was the follow up of subjects complete enough? No
How precise are the results? Precision was poor. The 95% confidence interval for sensitivity was 48-100%. Spec was better at 77-85% but we must remember these measures are CORRELATED, and therefore the poor sensitivity is also a reflection on Specificity. Had they picked a different cut-off for troponin then they could have improved the sensitivity (at a cost to the specificity)
Do you believe the results? Yes
Can the results be applied to the local population? Unsure
Do the results of this study fit with other available evidence? Unsure
Key Results: They initially identified 1,146 potentially eligible patients. After excluding the patients who had a specific complaint listed and those with documented fever, they were left with a total of 594 patients. Of those, 69% had troponins ordered. The average age of the cohort was 78 years old, 58% were female, and 75% were admitted. The most common chief complaints were altered mental status (43%), weakness/fatigue (33%), and dizziness (21%).
The proportion of patients with nonspecific complaints who underwent troponin testing: 412/594 (69%)
The proportion who had an elevated troponin in the ED: 52/412 (12.6%) (Another 30 patients had an elevated troponin at some point during their hospital stay)
The proportion of patients with ACS at the index visit or within 30 days: 5/412 (1.2%) All occurred during the index admission.
The utility of troponin testing to diagnose or exclude ACS. Looking only at the first troponin in the ED, it was 80% sensitive and 88% specific (NPV = 99.7%, PPV = 7.7%) for ACS. The LR+ was 6.67, and LR– was 0.23. Considering all troponins, the sensitivity was 100% (95% CI = 48%–100%), the specificity was 81% (95% CI = 77%–85%), the NPV was 100%, and the PPV was 6.1%.
The frequency of other causes of troponin elevation in this population. There was a long list of non-ACS causes of troponin elevation. The top 3 causes were: dehydration, heart failure, and atrial fibrillation.
We asked Dr. Wang ten questions to get a greater understand of his publication. Listen to the SGEMHOP podcast to hear all of Dr. Wang's answers.
Dr. James VandenBerg
Defining “Non-Specific”: The definition of “non-specific” symptoms is problematic while at the same time being pragmatic. For instance, “dizzy” could be construed as non-specific, but what if the patient had supporting focalized neurologic complaints? Additionally, some physicians list the chief complaint as the leading sentence a patient provides. This is problematic if a patient initially cites a “non-specific” complaint, but then describes suggestive ACS symptoms in their HPI. Conversely, “focal” chief complaints such as “shortness of breath” can be construed as non-specific in real practice based on the patient’s HPI, but due to the paper’s inclusion criteria, if any triage nurse or physician labeled a chief complaint as “focal” they would be excluded.
Chief Complaints Not Equal: Definitions of nonspecific included a spectrum of complaints, from altered mental status to failure to thrive. I imagine the yield of testing is much higher in altered mental status than it is in failure to thrive. Would there be a benefit of considering these chief complaints separately?
Retrospective Charting: You excluded patients who had nonspecific complaints at triage, but had a focal complaint listed in the ED physician note. The ED physician note might have been written after the troponin result was known. In the presence of a positive troponin, focal complaints might have been emphasized, despite being originally nonspecific.

Jan 11, 2020 • 33min
SGEM#279: Do You Really Want to Hurt Me and Use a Placebo Control for a Migraine Trial?
Date: January 10th, 2020
Reference: Dodick DW et al. Ubrogepant for the Treatment of Migraine. NEJM 2019
Guest Skeptic: Dr. Anand Swaminathan is an Assistant Professor of Emergency Medicine at St. Joseph’s Hospital in Paterson, NJ. He is also the managing editor of EM:RAP and associate editor at REBEL EM.
Case: A 23-year-old man with a history of migraines presents with two days of headache, nausea and photo-photophobia typical of his prior migraines. He’s tried a number of medications at home including ibuprofen, acetaminophen, aspirin and sumatriptan without any considerable improvement in symptoms. You start to offer him your standard medications like metoclopramide and haloperidol when he asks about a new drug he heard about called ubrogepant.
Background: Migraine headaches are a chronic neurologic disease characterized by throbbing, often unilateral headaches that are often associated with nausea, vomiting, photophobia and phonophobia. It is a common disease and can be severe enough to impede on people’s lives.
Headaches themselves are not only a common emergency department presentation but one that is filled with potential dangers. There are a number of causes of headache that are life and limb threatening – subarachnoid hemorrhage (SGEM#201), meningitis, encephalitis, cerebral venous thrombosis, vertebral artery dissection among other things but, most headaches are benign in nature.
There is an international classification system of headaches (IHS 2018). The current system classifies them into primary and secondary headaches. An important part of our job as emergency physicians is to differentiate the lethal headache from the benign headache.
Though we rarely make a de novo diagnose of migraines in the emergency department, many patients with migraines present to us for symptom management. The pathophysiology of migraines is both complicated and poorly understood but there are a number of potential treatments including NSAIDs, acetaminophen, aspirin, neuroleptics, triptans and even propofol.
More recently, calcitonin gene-related peptide antagonists (CGRPs) have emerged as a new potential treatment. The first big study that came out on these drugs was published in the NEJM in 2019 and was entitled Rimegepant, an Oral Calcitonin Gene-Related Peptide Receptor Antagonist for Migraine (Lipton et al).
Now, we have a second study published in the NEJM on a related drug, ubrogepant.
Clinical Question: Does ubrogepant increase the percentage of patients who were free from pain and absent of the most bothersome migraine-associated symptom at two hours from initial dose in comparison to placebo?
Reference: Dodick DW et al. Ubrogepant for the Treatment of Migraine. NEJM 2019
Population: Adult patients (18-75 years of age) with at least a one-year history of migraine with or without aura that met criteria from the International classification of headache disorders and had migraine onset before the age of 50. Patients had to have a history of migraines between 4-72 hours and a history of migraine attacks separated by at least 48 hours of freedom from headache. Additionally, they had to have suffered from two to eight migraines per month over the last three months.
Exclusions: Patients with 15 or more headaches/month on average in the previous six months. Hard to distinguish the type of headache. Use of acute migraine treatment on ten or more days in the previous three months. Participated in a trial involving CGRP. Had clinically significant cardiovascular or cerebrovascular disease. History of hepatitis in the last six months or laboratory findings of liver disease (elevated AST, AST, Bilirubin or low serum albumin).
Additional Exclusions from ClinicalTrials.gov
Has a history of migraine aura with diplopia or impairment of level of consciousness, hemiplegic migraine, or retinal migraine
Has a current diagnosis of new persistent daily headache, trigeminal autonomic cephalgia (eg, cluster headache), or painful cranial neuropathy
Required hospital treatment of a migraine attack 3 or more times in the previous 6 months
Has a chronic non-headache pain condition requiring daily pain medication
Has a history of malignancy in the prior 5 years, except for adequately treated basal cell or squamous cell skin cancer, or in situ cervical cancer
Has a history of any prior gastrointestinal conditions (eg, diarrhea syndromes, inflammatory bowel disease) that may affect the absorption or metabolism of investigational product; participants with prior gastric bariatric interventions which have been reversed are not excluded
Intervention: Ubrogepant 50 mg or 100 mg
Comparison: Placebo
Outcomes:
Co-Primary Outcome: Freedom from pain at two hours from initial dose of medication. Absence of the most bothersome symptom associated with migraine two hours from initial dose of medication.
Secondary Outcomes: Change in severity of headache at two hours, sustained pain relief, sustained freedom from pain, absence of photophobia, absence of photophobia and absence of nausea at two hours from initial dose. Adverse events were also collected.
Authors’ Conclusions:“A higher percentage of participants who received ubrogepant than of those who received placebo had freedom from pain and absence of the most bothersome symptom at 2 hours after the dose. The most commonly reported adverse events were nausea, somnolence, and dry mouth. Further trials are needed to determine the durability and safety of ubrogepant for acute migraine treatment and to compare it with other drugs for migraine.”
Quality Checklist for Randomized Clinical Trials:
The study population included or focused on those in the emergency department. No
The patients were adequately randomized. Yes
The randomization process was concealed. Yes
The patients were analyzed in the groups to which they were randomized. No
The study patients were recruited consecutively (i.e. no selection bias). Unsure
The patients in both groups were similar with respect to prognostic factors. Unsure
All participants (patients, clinicians, outcome assessors) were unaware of group allocation. Yes
All groups were treated equally except for the intervention. Yes
Follow-up was complete (i.e. at least 80% for both groups). No
All patient-important outcomes were considered. No
The treatment effect was large enough and precise enough to be clinically significant. Unsure
Key Results: They enrolled 1,672 patients with roughly equal numbers allocated to each of the three groups. The mean age was around 40 years and almost 90% were female. The modified ITT analysis excluded 345 (21%) of participants.
Ubrogepant was superior to placebo in treating migraine headaches.
Primary Outcomes: (100mg/50mg/placebo)
Freedom from pain at two hours: 21%/19%/12% (both doses statistically better than placebo but, not better than the other). That gives an absolute difference of about 8% and Number Needed to Treat for Benefit (NNTB) of 13
Absence of most bothersome symptom at two hours: 38%/39%/28%. This is an absolute difference of 10% with a NNTB of 10.
Secondary Outcomes:
Pain relief at two hours (61%/61%/49%) and sustained pain relief (38%/36%/21%) was better with ubrogepant compared to placebo.
Serious Adverse Events:There were five SAE with all of them being in the intervention group (two appendicitis, pericardial effusion, spontaneous abortion and seizure). Only the seizure was considered related to the trial drug. Six patients had ALT levels three times the upper limit of normal (one in the placebo group and five in the treatment group). Only one of the treatment group was considered possibly related to the trial regimen. Details are in the supplemental appendix.
1. Patients: We had a few issues with the patients included in this study. First, these were not emergency department patients but rather those recruited from outpatient clinic. Whether or not these are the same patients that present to the emergency department is unknown.
We are also unsure if the patients were recruited consecutively. This is an important aspect to avoid potential selection bias. Remember that when we use the term “bias” we are not talking about random noise in the data but something that systematically moves us away from the “truth”.
The third question we had about the included patients was whether or not both groups were similar with respect to prognostic factors. Baseline demographics are reported in Table 1. However, things like number of headaches/month, refractory headaches in the past, and other things are not reported. This could impact the results and therefore the conclusions.
2. Comparison to Placebo: Randomized control trials (RCTs) are considered an ideal study design to establish causality and effect of a medication. Drug intervention RCT design requires that the intervention be compared to something (active drug, standard treatment, no treatment or placebo).
It is widely agreed upon that comparison to placebo is acceptable when no proven intervention exists (Millum and Grady 2013). In contrast, placebo comparison is not considered acceptable in life-threatening conditions if there is an available treatment that is known to prolong life. The use of placebo for comparison in non-life-threatening conditions has been hotly debated for decades, particularly when an accepted treatment exists.
The argument against the use of placebos in these circumstances is guided by the Declaration of Helsinki. This documents state:
“In any medical study, every patient — including those of a control group, if any — should be assured of the best proven diagnostic and therapeutic methods.”
Thus, if an effective treatment exists, it should be prescribed to patients (Simon 2000).

Dec 28, 2019 • 28min
SGEM Xtra: Come Together, Right Now – Over Renal Colic
Date: December 16th, 2019
Reference: Moore et al. Imaging in Suspected Renal Colic: Systematic Review of the Literature and Multispecialty Consensus. Annals of EM, JU, and JACR 2019.
Dr. Chris Moore
Guest Skeptics: Dr. Christopher Moore is an Associate Professor of Emergency Medicine a Yale School of Medicine. He is also the Chief for the Section of Emergency Ultrasound and Director of the Emergency Ultrasound Fellowship.
Dr. Kevan Sternberg is an Associate Professor of Urology at the University of Vermont Medical Center.
This is an SGEM Xtra and is a result of a paper that was published in three journals (Annals of EM, Journal of Urology and Journal of the American College of Radiology). The paper was about what is the best diagnostic imaging modality for renal colic.
Renal Colic on the SGEM:
SGEM#4: Getting Unstoned
Kidney Stones
SGEM#32: Stone Me
SGEM#71: Like a Rolling Stone
SGEM#97: Hippy Hippy Shake – Ultrasound Vs. CT Scan for Diagnosing Renal Colic
SGEM#154: Here I Go Again, Kidney Stone
SGEM#202: Lidocaine for Renal Colic?
SGEM#220: Acupuncture Morphine for Renal Colic
SGEM#230: Tamsulosin – You’ve Lost that Loving Feeling – For Renal Colic
Dr. Kevan Sternberg
There are greater than two million annual emergency department (ED) visits for suspected renal colic in the United States, and computed tomography (CT) scanning is now performed for more than 90% of patients who receive a diagnosis of kidney stone.
Despite a significant increase in CT use for diagnosis during the last two decades, patient-centered outcomes such as admission and intervention do not appear to have been affected.
There was a trial published in 2014 comparing radiology department ultrasound, POCUS and CT for suspected nephrolithiasis (Smith-Bindman et al. NEJM 2014). We covered this on SGEM#97 with Dr. Tony Seupaul and Dr. Spencer Wright. The bottom line from that episode was bedside emergency department ultrasound is safe and has several advantages over CT for the diagnosis of kidney stones.
Despite this evidence, recent data suggest that ultrasonography is used for less than 7% of patients receiving a diagnosis of kidney stone, and CT use has continued to increase. Similarly, although reduced-radiation-dose CT is recommended for the evaluation of renal colic, it is used for less than 10% of patients with kidney stone.
What did you do in this study?
We sought out a nine-member panel with representation from three specialty societies: ACEP, the American College of Radiology, and the American Urological Association.
How did you decide who was on the panel?
All panel members were board-certified practicing academic physicians and were nominated according to previous work on specialty- specific guidelines.
Clinical Question: For patients presenting to the ED with pain suspected to be uncomplicated renal colic, what imaging should be pursued compared with standard noncontrast CT scanning to optimize patient-centered outcomes?
To answer the question, you did a systematic review of the literature using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines.
After reviewing and summarizing the literature for imaging modalities, we delineated specific clinical scenarios to illustrate decision making with respect to initial imaging. We came up with a total of 29 clinical vignettes representing a balance of possible permutations (age, sex, pregnancy status, likelihood of stone disease, and likelihood of acute alternative diagnosis).
How did you come up with consensus on the best diagnostic imaging strategy?
Consensus was sought with a modified Delphi process that included three rounds of anonymous voting, with two group discussions between rounds. All nine members of the group answered the vignettes in a blinded fashion.
What were the three imaging options?
For purposes of defining consensus, imaging modalities were separated into three groups (no further imaging, ultrasonography, and CT), although subtypes within imaging modalities are reported.
What were the results?
We reached at least moderate consensus in all 29 scenarios, with perfect or excellent consensus in 80%.
Five Major Themes
Younger Patients (~35 years old): Even without a history of stones, CT may be avoided as long as pain is controlled (perfect consensus).
Middle-Aged Patients (~55 years old): We recommend CT if there is no history of kidney stones.
Older Patients (~75 years old): We recommend CT regardless of history.
Pregnant and Pediatric Patients: With a typical presentation they should undergo ultrasonography and do not require initial CT if symptoms are relieved.
Radiation Dose: We recommend reduced-radiation-dose CT whenever CT is used for suspected renal colic.
Were there any limitations you identified?
There are many more 29 clinical scenarios. We chose this number because it seemed to be the best balance of major factors with the least number of scenarios. The scenarios are also skewed toward those in which the clinical likelihood of a kidney stone is high according to objective criteria. Although we did include scenarios with stone being less likely and found that in these scenarios practitioners were more likely to request CT, there may have been a bias toward assuming these scenarios represented kidney stone and no other diagnosis.
Happy New Years to all the SGEMers. We will be back in 2020 with more critical appraisals of recent publications.
Trying to cut the knowledge translation window down from over ten years to less than one year using the power of social media (FOAMed).
Our ultimate goal is for patients to get the best care, based on the best evidence.
Remember to be skeptical of anything you learn, even if you heard it on the Skeptics’ Guide to Emergency Medicine.

Dec 21, 2019 • 32min
SGEM Xtra: NNT – WET or DRI
Date: December 17th, 2019
Reference: Reeves and Reynolds. The NNT-WET and NNT-DRI: (Mostly) Satirical New Metrics to Emphasize the Inherent Inefficiency of Clinical Practice. AEM Dec 2019.
Guest Skeptics: Dr. Mathew Reeves is a Professor and Interim Chair of the Department of Epidemiology and Biostatistics at the College of Human Medicine at MSU.
Dr. Joshua Reynolds is an Associate Professor of Emergency Medicine at the College of Human Medicine at MSU. Outside of his academic duties, he works clinically in the adult ED at Spectrum Health in Grand Rapids, Michigan, the tertiary care center for Western Michigan.
This is an SGEM Xtra and is a result of the December AEM publication suggesting new metrics to emphasize the inherent inefficiency of clinical practice. This (mostly) satirical article seems to be in the same theme of the annual BMJ holiday edition.
The BMJ has published some great studies in their holiday edition. We have covered two of them on the SGEM:
SGEM#6: Orthopedic Surgeons: Strong AND Smart!
SGEM#23: A Bump Up Ahead (Diagnosis of Appendicitis)
One of my other favourite BMJ holiday edition articles has been the classic parachute trial (Smith and Pell BMJ Dec 2003). Parachutes have been used for years to prevent orthopaedic, head and soft tissue injuries after a gravitational challenge (jumping out of planes). There was observational data that showed parachute use led to injury and there were case reports of people surviving falling/jumping out of a plane without a parachute or it opening properly. They could find no randomized controlled trials (RCT) to include in their systematic review and meta-analysis (SRMA).
The authors suggested taking evidence-based medicine (EBM) advocates up in a plane and have them randomized in a double-blinded fashion to a parachute or a sham (backpack). It would be a cross over trial. Those participants who survived the first jump would be randomized into the opposite group. Only then would there be definitive evidence for the efficacy of parachutes.
Since that SRMA published in 2003, there has been a randomized control trial conducted and published on the topic of parachutes. It was published last year in the 2018 BMJ holiday edition (Yeh et al BMJ Dec 2018). It will be covered as an SGEM Xtra in 2020.
NNT: Number Needed to Treat
Dr. Joshua Reynolds
The NNT stands for the Number Needed to Treat. It estimates the average number of patients who need to be treated to positively impact one person with therapeutic benefit. It was originally described in 1988 by Andreas Laupacis, an internist and clinical epidemiologist who was at McMaster University in Ontario at the time. (Laupacis A et al NEJM 1988).
How is the NNT Described Mathematically?
The "number needed to do anything" is the inverse of the absolute change in risk. So in this case, the number needed to treat is the inverse of the absolute risk reduction (ARR): NNT=1/ARR
An Example of Calculating the NNT
Let’s say that there is a new drug to treat a bad disease and it reduces mortality from 25% to 15%. The absolute risk reduction is 10%, so NNT is the inverse of 0.1 and the NNT would be 10. Likewise, if that same drug reduces mortality from 25% to 20%, then the absolute reduction is 5% and the NNT would be 20, or 1 divided by 0.05.
What is An Advantage To Using the NNT?
One advantage is that using a single number, NNT describes the absolute impact or effectiveness of a particular therapy. Interventions with lower NNT are considered more efficacious, since one must theoretically treat fewer patients to observe an effect.
Is the NNT Popular?
Yes, there is an entire Internet domain devoted to NNT (www.thennt.com) . This site extols the virtues of NNT to promote the most effective therapies while questioning those with insufficient benefit.
NNH: Number Needed to Harm
Dr.Mathew Reeves
When quantifying the harms associated with treatment, the corollary to NNT is “number needed to harm” (NNH), which is calculated as the inverse of absolute risk increase: NNH = 1/Absolute Risk Increase
The NNH estimates the average number of patients who need to be treated before one person is negatively impacted by a harmful side effect caused by the therapy.
Interventions with higher NNH are theoretically less risky, since more patients can be treated before an adverse treatment-related event occurs. When combined with NNT, these two numbers convey to patients, in a simple manner, the trade-offs between risks and benefits of treatment. Presumably, this is a simpler method to convey risks and benefits to patients than trying to describe relative or absolute changes in risk, or trying to describe odds ratios.
Are There Limitations to Using the NNT and NNH?
Yes, there are a number of limitations to using NNT and NNH estimates. One key issues is that you must know what time period these estimates are based upon. Every NNT and NNH has an explicit time period associated with the metric. Also the NNT and NNH do not capture clinical relevance or cost. The NNT may be very low for the primary outcome of a study, but if it is a disease-oriented outcome (DOO) with no patient-oriented outcome (POO) the NNT can be misleading. Another limit is cost. How much money does it cost for the intervention? If it costs pennies and has a very low NNT that would be great but would not be as good as a treatment with the same NNT that costs millions of dollars.
For more on this issue check out PEM Super Hero, Dr. Anthony Crocco (SGEM faculty member) who has a great white board video explaining the concept and application of the NNT and NNH (SketchyEBM).
NNT-WET: Number Needed to Waste Everybody’s Time
Lost in the populist enthusiasm for NNT is its inherent mathematical complement, which is a more realistic and clinically useful number for the practicing physician. Thus, we propose the “number needed to waste everybody’s time” (NNT-WET). The NNT-WET = NNT-1.
The NNT-WET estimates the average number of patients who need to be treated, but receive no therapeutic benefit, for someone else to benefit. NNT-WET is a direct measure of the inefficiency of clinical practice; it conveys the ineffectiveness of clinical interventions by measuring the effort required to help just one solitary patient.
In the postmodern era of limited medical resources and therapeutic nihilism, NNT-WET is the metric that provides the appropriate level of cynicism required by today’s practicing clinician.
Are There Any Advantages to Using the NNT-WET Over the NNT?
Yes, there are several. First, NNT fails to sufficiently emphasize that most patients do not benefit from treatments routinely used in clinical practice. Since the vast majority of NNT estimates exceed two, a given individual patient is unlikely to benefit from treatment (Figure 1A).
How Does the NNT-WET Change the Conversation?
The NNT-WET shifts the clinical conversation from the assumption that we must treat the patient (e.g., “This treatment is great—its NNT is only ten!”), to a state best described as therapeutic malaise (e.g., “The NNT-WET is nine . . . what’s the point?”). The NNT- WET helps illustrate that for most treatments, the costs, inconvenience, and risks are disproportionately applied to the many, so that only a single person (whom, most importantly, is not you!) can benefit.
But really this approach should be tested empirically. For example, one could present clinical scenarios to patients and/or clinicians detailing effect measures of proposed treatments with NNT or NNT-WET (not to mention absolute or relative risk reduction!). Our hypothesis is that scenarios based on NNT-WET in lieu of NNT would result in patient and clinicians selecting marginally effective treatments less frequently. We are awaiting review of our grant proposal from the nihilism study section of the NIH.
NNT-DRI: Number Needed To Divert Reckless Intervention
After laborious review of a thesaurus to make the acronyms work, we arrived at the “number needed to divert reckless intervention”. Using the same rationale as NNT-WET for NNT, we propose the NNT-DRI as a revised measure for NNH. The NNT-DRI = NNH -1.
The NNT-DRI estimates the average number of patients who need to be treated, and who escape the therapy’s adverse effects, in order for someone else to sustain the adverse event. It is a measure of the recklessness of clinical intervention; a small NNT-DRI indicates that only a few patients escape harm, whereas a large NNT-DRI is reassuring since regardless of whether any patient benefits, many patients are not harmed. A large NNT-DRI is a state of Hippocratic bliss; "primum non nocere".
Are There Advantages to the NNT-DRI Over the NNH?
Yes, the NNH, which insufficiently acknowledges the patients that regularly escape therapeutic maleficence. Clinicians should rejoice in large NNT-DRI estimates that represent the multitudes of patients they have not harmed (Figure 1B).
The NNT-DRI helps illustrate that adverse effects of treatments are disproportionately applied to an unfortunate few, while the rest mange to escape them. NNT-DRI shifts the clinical conversation from a serious discussion of risk (e.g., “This treatment is dangerous, the NNH is only five.”), to a state of reassurance best described as willful ignorance (e.g. “Maybe so, but four of them will do just fine!”).
Can You Give A Practical Example of the NNT-WET and NNT-DRI?
Take thrombolysis for the treatment of acute ischemic stroke. Using risk estimates from the 2014 meta-analysis of individual patient data by Emberson et al in Lancet, we estimated the NNT to achieve excellent functional recovery 3 to 6 months after treatment ranged from 10 (0–3 hours after symptom onset) to 50 (4.5–6 hours after symptom onset).
These estimates translate to NNT-WET values ranging from 9 to 49, respectively. Thus,

Dec 14, 2019 • 29min
SGEM#278: Seen Your Video for Acute Otitis Media Discharge Instructions?
Date: December 13th, 2019
Reference: Belisle et al. Video discharge instructions for acute otitis media in children: a randomized controlled open-label trial. AEM December 2019
Guest Skeptic: Dr. Chris Bond is an emergency medicine physician and assistant Professor at the University of Calgary. He is also an avid FOAM supporter/producer through various online outlets including TheSGEM.
Case: An 18-month-old, previously healthy female presents to the emergency department with 24 hours of fever. The past few days the parents note there has been some rhinorrhea and cough. She looks well, immunizations are up to date and her examination reveals right sided acute otitis media (AOM). When discussing discharge instructions for her AOM, you wonder whether having the parents watch a video will be more beneficial for the child’s symptoms, rather than giving the parents oral instructions with a paper handout.
Dr. Chris Bond
Background: AOM is the second most commonly diagnosed illness in children and the most common indication for antibiotic prescription [1-2]. There are significant costs associated with AOM and parents often bring their children to health care providers for evaluation of pain and fever [3-4]. More than one third of children experience pain, fever or both three to seven days following treatment, and nearly seventy-five percent of parents identify pain and disturbed sleep as the most important sources of AOM related burden [5-6].
There is significant parental uncertainty regarding treatment of AOM and less than 30% of US parents receive instructions on appropriate analgesia for their children [7-8]. Discharge instruction complexity and inadequate comprehension is associated with medication errors, suboptimal post-discharge care and unnecessary recidivism [9-12]. Medication errors can be reduced using standardized discharge instructions, and parents prefer these to verbal summaries [13-15].
Video discharge instructions have been shown to be preferred over paper instructions in many pediatric presentations, however no study has explored the effectiveness of video instructions for AOM [16-17].
Clinical Question: Are video discharge instructions superior to a paper handout with respect to the Acute Otitis Media – Symptom Severity Score (AOM-SOS)?
Reference: Belisle et al. Video discharge instructions for acute otitis media in children: a randomized controlled open-label trial. AEM December 2019
Population: Parents of children age 6 months to 17 years with a chief complaint of otalgia in the setting of URTI and where the treating physician was at least 50% certain of a clinical diagnosis if AOM. Diagnostic certainty was on a 100mm visual analog scale based on the physicians’ rate of color photos of AOM.
Excluded: Parents who were not the primary care provider, had poor English proficiency, lacked internet or telephone access, and whose children had: a pre-existing diagnosis of AOM (<72 hours old); other concomitant diagnoses (pneumonia, urinary tract infection, gastroenteritis, sinusitis, or any other condition requiring antibiotics and/or hospital admission); tympanostomy tubes; acute tympanic membrane perforation.
Intervention: Video discharge instructions
Comparison: Paper-based discharge instructions identical to the video discharge instructions
Outcome:
Primary Outcome: AOM Severity of Symptom (AOM-SOS) score on day three post-discharge.
Secondary Outcomes: Knowledge questionnaire scores, parental satisfaction with the intervention, number of days of missed school or daycare (child) and work (parent), proportion of children with at least one return visit to a healthcare provider, and proportion of children who received analgesia.
Dr. Naveen Poonai
This is an SGEMHOP episode which means we have the lead author on the show. Dr. Naveen Poonai is a Paediatric Emergency Medicine physician at the Children’s Hospital, London Health Sciences Centre, Associate Professor of Paediatrics and Internal Medicineat Western University, Canadian Association of Paediatric Health Centres (CAPHC) project lead for Paediatric Pain Assessment, and has a cross-appointment with the Department of Epidemiology and Biostatistics. He was previously on SGEM#177 discussing POCUS for diagnosing pediatric fractures.
This episode we are going to be talking about acute otitis media. There are a number of different guidelines out there for acute otitis media (Canadian Pediatric Society, American Academy of Pediatrics, American Association of Family Physicians, United Kingdom, and Australia) Naveen prefers the Canadian Pediatric Society guidelines.
Canadian Pediatric Society algorithm for the management of AOM in children over 6 months of age.
Authors’ Conclusions: "Children of parents with AOM who watched a five-minute video in the ED detailing the identification and management of pain and fever experienced a clinically important and statistically significant decrease in symptomatology compared to a paper handout.”
Quality Checklist for Randomized Clinical Trials:
The study population included or focused on those in the emergency department. Yes
The study participants were adequately randomized. Yes
The randomization process was concealed. Yes
The participants were analyzed in the groups to which they were randomized. Yes
The study participants were recruited consecutively (i.e. no selection bias). No
The participants in both groups were similar with respect to prognostic factors. Unsure
All participants were unaware of group allocation. No
All groups were treated equally except for the intervention. Yes
Follow-up was complete (i.e. at least 80% for both groups). No
All patient-important outcomes were considered. Yes
The treatment effect was large enough and precise enough to be clinically significant. Yes
Key Results: Overall, 5334 parents were screened for eligibility, 219 were randomized and analyzed and 149 completed the primary outcome (77 video; 72 paper instructions). Children included 107/219 (49%) females with an overall mean age of 2.9 years and 41/219 (18.7%) were not offered analgesia prior to arrival. There were no crossovers in the trial.
AOM-SOS score was significantly lower on day 3 in the video group
Primary Outcome: AOM-SOS score on day three (0 to 14 with higher scores indicative of greater symptom severity)
1 video group vs. 3 paper group (p=0.004) even after adjusting for pre-intervention AOM-SOS and medication use (analgesics and antibiotics)
Secondary Outcomes:
There were no significant differences in secondary outcomes, including knowledge gain, functional outcomes or the number of children receiving antibiotics or analgesics following discharge.
1. Children:You included children age 6 months to 17 years of age. There is a big difference between an infant and a teenage. Why not just limit it to children under 5 years old? The mean age was 2.9 years with a SD of 2.8 years.
It is true that a young child is quite different from a teenager. We decided to cast a wide net to be more instead of less inclusive. Older children suffer from AOM as well and inclusion of these individuals extends the generalizability of our findings.
2. Diagnosis of AOM: The diagnosis of AOM can be a bit tricky. You included patients that the physician was 50% certain of a clinical diagnosis of AOM using a 100mm visual analog scale. That was based on color photos of AOM from published diagnostic criteria. Why not use a more objective criteria like tympanometry or acoustic reflectometry to increase diagnostic certainty?
In an ideal world we would have been able to use tympanometry or acoustic reflectometry, however these tools are unfortunately not available in our emergency department.
3. Convenience Sample: Recruitment was done seven days a week from 10am to 10pm. We understand the realities of conducting research and having someone available 24 hours a day. However, do you think parents that present overnight with sick children a different than those who present during the day?
It is possible that children that present in the middle of the night are experiencing more pain than those that present during daytime or evening hours. But is more likely that the pain they are experiencing is disruptive to their sleep and perhaps more so, their parents’ sleep. Parents that present with their child overnight may process discharge information quite differently from daytime hours.
4. Single Tertiary Pediatric Centre:This was a single centre study done at a pediatric emergency department. Do you think this data can be extrapolated to other pediatric emergency departments in Canada or internationally?
I think that this data can certainly be extrapolated to other Canadian pediatric emergency departments as other tertiary care pediatric centres are likely to have populations similar to ours. However further study would have to be undertaken to determine if the data would be applicable to international populations of differing languages and cultures. We excluded non-English speaking populations for feasibility purposes and so this study would have to be repeated including those speaking other languages to be able to confidently say the data apply more broadly.
In addition, I work in a rural community emergency department. We see adults and children. Do you think these results would apply to non-pediatric emergency departments?
I think these results would definitely apply to rural community emergency department pediatric patients of English speaking families.
5. Education Level: The parents in your study were well educated. More than 70% had at least a college education. How do think this could have impacted your results?
I think this may have contributed to the reason we saw no difference in knowledge acquisition between groups.


