Episode 23: Statistical and Algorithmic Thinking in the AI Age
Dec 20, 2023
auto_awesome
Allen Downey discusses statistical paradoxes and fallacies in using data, including the base rate fallacy and algorithmic fairness. They dive into examples like COVID vaccination data and explore the challenges of interpreting statistical information correctly. The conversation also covers topics such as epidemiological paradoxes, Gaussian distributions, and the importance of understanding biases in data interpretation for media consumption.
Understanding statistical paradoxes in data-driven decision-making is crucial for navigating an algorithmic world.
Ensuring algorithmic fairness poses challenges due to trade-offs between different definitions of fairness and potential biases.
Causal inference complexities, like collider bias in criminal justice, underscore the need for transparent algorithm design and adapting to emerging methods.
Deep dives
Importance of Statistical Thinking and Data Skills
The podcast discusses the importance of statistical and data skills in navigating an increasingly data-driven world. The speaker, Alan Downey, emphasizes the need to understand statistical paradoxes and fallacies that can hinder decision-making when using data. Concrete examples are provided, such as the misleading implication about COVID-19 vaccination rates in the United Kingdom.
Challenges in Algorithmic Fairness
The episode delves into algorithmic fairness and the complexities of meeting multiple definitions of fairness simultaneously. It highlights a case where machine learning algorithms used in criminal justice led to discrepancies in error rates for different racial groups, prompting discussions on the trade-offs between false positives and negatives. The importance of transparency in algorithm design to ensure fairness and mitigate biases is emphasized.
The Intricacies of Causal Inference
The podcast explores the challenges of causal inference, particularly in contexts like the criminal justice system. Concepts such as collider bias and the impact of causal diagrams in understanding complex relationships are discussed. The conversation underscores the significance of causal inference in decision-making and the necessity of adapting to emerging methods in this field.
Understanding Data: Everyone is Weird and Nobody is Normal
In a large dataset, multiple measurements reveal that everyone is weird in at least one way, challenging the idea of being normal. This emphasizes the diverse ways individuals can deviate from averages, making uniqueness common. The concept of 'thin shell result' in multivariate distributions explains that everyone tends to be at a similar distance from the origin, showcasing that people are equally weird.
Log Normal Distributions and Achievements in Life
Log normal distributions, such as adult weight data, showcase how weight measurements skew to the right, indicating more heavy individuals than expected. The log normal distribution theory explains that multiplying random factors results in skewed distributions, like adult weight. This distribution model aligns with life achievements showing that talents tend to be Gaussian at birth but achievements follow a log normal pattern, explaining outliers' continuous growth and unique capacities.
Hugo speaks with Allen Downey, a curriculum designer at Brilliant, Professor Emeritus at Olin College, and the author of Think Python, Think Bayes, Think Stats, and other computer science and data science books. In 2019-20 he was a Visiting Professor at Harvard University. He previously taught at Wellesley College and Colby College and was a Visiting Scientist at Google. He is also the author of the upcoming book Probably Overthinking It!
They discuss Allen's new book and the key statistical and data skills we all need to navigate an increasingly data-driven and algorithmic world. The goal was to dive deep into the statistical paradoxes and fallacies that get in the way of using data to make informed decisions.
For example, when it was reported in 2021 that “in the United Kingdom, 70-plus percent of the people who die now from COVID are fully vaccinated,” this was correct but the implication was entirely wrong. Their conversation jumps into many such concrete examples to get to the bottom of using data for more than “lies, damned lies, and statistics.” They cover
Information and misinformation around pandemics and the base rate fallacy;
The tools we need to comprehend the small probabilities of high-risk events such as stock market crashes, earthquakes, and more;
The many definitions of algorithmic fairness, why they can't all be met at once, and what we can do about it;
Public health, the need for robust causal inference, and variations on Berkson’s paradox, such as the low-birthweight paradox: an influential paper found that that the mortality rate for children of smokers is lower for low-birthweight babies;
Why none of us are normal in any sense of the word, both in physical and psychological measurements;
The Inspection paradox, which shows up in the criminal justice system and distorts our perception of prison sentences and the risk of repeat offenders.