Episode 20: Data Science: Past, Present, and Future
Oct 5, 2023
auto_awesome
Chris Wiggins, Chief data scientist for the New York Times, and Matthew Jones, professor of history at Princeton University, discuss their book on the history of data and its impact on society. They explore topics such as the use of data for decision making, the development of statistical techniques, the influence of Francis Galton on eugenics, and the rise of data, compute, and algorithms in various fields.
The history of data science is intertwined with gender bias, with the contributions of women often overlooked or forgotten.
Bayesian techniques, developed by Polish mathematicians and later embraced by Alan Turing, played a crucial role in code-breaking during World War II.
The age of algorithms has brought about new challenges, including the exploitation of labor and the obfuscation of value creation in the tech industry.
The book highlights the conflict between Fisher and Naaman in statistical analysis, emphasizing the importance of Bayesianism in determining model accuracy.
Deep dives
The Rise of Social Physics
Social physics emerged in the late 18th century as a way to apply the methods of celestial mechanics to understand society. It sought to quantify social phenomena and find objective truths about human behavior. However, it led to reductionist views and the simplification of complex human experiences. This approach influenced the development of mathematical statistics and shaped industrial practices driven by profit motives.
Gendered Labor in Data Science
From Bletchley Park's role in code-breaking during World War II to the gendered division of labor, the story of data science has long been intertwined with gender bias. The women at Bletchley Park played crucial roles running the machines and making the code-breaking efforts possible, but their contributions were often overlooked or forgotten. This pattern of gendered labor continues today, as certain roles and responsibilities in tech and data science are still biased towards men.
The Triumph of Bayes
During the code-breaking efforts at Bletchley Park, Bayesian techniques played a crucial role in deciphering German codes. These techniques, developed by Polish mathematicians and later embraced by Alan Turing and others, leveraged probabilistic reasoning to make decisions under uncertainty. Bayesian methods provided a practical and effective approach for solving real-world problems, fitting the industrial context of the time.
The Age of Algorithms and Hidden Labor
The age of algorithms has brought about a new wave of challenges, including the gig economy and the hidden labor force behind tech platforms. Books like Ghost Work and Custodians of the Internet delve into the exploitation of labor and the obfuscation of value creation in the AI and data-driven industries. These investigations shed light on the complex dynamics between technology, labor, and power in our modern society.
The role of Fisher and Naaman in statistical analysis
The book discusses the historical conflict between Fisher and Naaman in statistical analysis. Fisher focused on probability under the null hypothesis, while Naaman investigated the probability of a specific model being true. This conflict highlights the importance of Bayesianism in determining the probability that a model is accurate.
The significance of Bayesianism
The book explores the significance of Bayesianism in statistical modeling. Bayesianism emphasizes the importance of understanding the probability that a model is true, which requires prior beliefs about the truth of the model. This approach challenges frequentist statistics and provides a more comprehensive understanding of the data.
The influence of intelligence agencies on Bayesianism
The book reveals the influence of intelligence agencies, such as the NSA, in promoting Bayesian analysis during the Cold War. While academia largely opposed Bayesianism, intelligence agencies embraced it for their classified work. This has had a lasting impact on the development of Bayesian techniques.
Hugo speaks with Chris Wiggins (Columbia, NYTimes) and Matthew Jones (Princeton) about their recent book How Data Happened, and the Columbia course it expands upon, data: past, present, and future.
Chris is an associate professor of applied mathematics at Columbia University and the New York Times’ chief data scientist, and Matthew is a professor of history at Princeton University and former Guggenheim Fellow.
From facial recognition to automated decision systems that inform who gets loans and who receives bail, we all now move through a world determined by data-empowered algorithms. These technologies didn’t just appear: they are part of a history that goes back centuries, from the census enshrined in the US Constitution to the birth of eugenics in Victorian Britain to the development of Google search.
DJ Patil, former U.S. Chief Data Scientist, said of the book "This is the first comprehensive look at the history of data and how power has played a critical role in shaping the history. It’s a must read for any data scientist about how we got here and what we need to do to ensure that data works for everyone."
If you’re a data scientist, machine learning engineer, or work with data in any way, it’s increasingly important to know more about the history and future of the work that you do and understand how your work impacts society and the world.
Among other things, they'll delve into
the history of human use of data;
how data are used to reveal insight and support decisions;
how data and data-powered algorithms shape, constrain, and manipulate our commercial, civic, and personal transactions and experiences; and
how exploration and analysis of data have become part of our logic and rhetoric of communication and persuasion.
You can also sign up for our next livestreamed podcast recording here!