Episode 23: Statistical and Algorithmic Thinking in the AI Age

11 snips

Dec 20, 2023

Allen Downey discusses statistical paradoxes and fallacies in using data, including the base rate fallacy and algorithmic fairness. They dive into examples like COVID vaccination data and explore the challenges of interpreting statistical information correctly. The conversation also covers topics such as epidemiological paradoxes, Gaussian distributions, and the importance of understanding biases in data interpretation for media consumption.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

2016 Election Surprise

In 2016, algorithmic predictions influenced people's lives.
However, the US election results surprised many, highlighting challenges in expressing uncertainty.

INSIGHT

The Inspection Paradox

The Inspection Paradox reveals how sampling bias distorts our perceptions, particularly with size-dependent observations.
Larger samples are overrepresented, leading to skewed averages, as seen with college class size estimations.

ANECDOTE

Recidivism and Sampling Bias

Length-biased sampling affects recidivism rates; observing court cases overestimates re-offending.
Repeat offenders are oversampled, leading to a distorted view of actual re-offense rates.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Hugo speaks with Allen Downey, a curriculum designer at Brilliant, Professor Emeritus at Olin College, and the author of Think Python, Think Bayes, Think Stats, and other computer science and data science books. In 2019-20 he was a Visiting Professor at Harvard University. He previously taught at Wellesley College and Colby College and was a Visiting Scientist at Google. He is also the author of the upcoming book Probably Overthinking It!

They discuss Allen's new book and the key statistical and data skills we all need to navigate an increasingly data-driven and algorithmic world. The goal was to dive deep into the statistical paradoxes and fallacies that get in the way of using data to make informed decisions.

For example, when it was reported in 2021 that “in the United Kingdom, 70-plus percent of the people who die now from COVID are fully vaccinated,” this was correct but the implication was entirely wrong. Their conversation jumps into many such concrete examples to get to the bottom of using data for more than “lies, damned lies, and statistics.” They cover

Information and misinformation around pandemics and the base rate fallacy;
The tools we need to comprehend the small probabilities of high-risk events such as stock market crashes, earthquakes, and more;
The many definitions of algorithmic fairness, why they can't all be met at once, and what we can do about it;
Public health, the need for robust causal inference, and variations on Berkson’s paradox, such as the low-birthweight paradox: an influential paper found that that the mortality rate for children of smokers is lower for low-birthweight babies;
Why none of us are normal in any sense of the word, both in physical and psychological measurements;
The Inspection paradox, which shows up in the criminal justice system and distorts our perception of prison sentences and the risk of repeat offenders.

LINKS