Linear Digressions

Ben Jaffe and Katie Malone

Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.

Episodes

Mentioned books

Mar 15, 2020 • 27min

Network effects re-release: when the power of a public health measure lies in widespread adoption

This week’s episode is a re-release of a recent episode, which we don’t usually do but it seems important for understanding what we can all do to slow the spread of covid-19. In brief, public health measures for infectious diseases get most of their effectiveness from their widespread adoption: most of the protection you get from a vaccine, for example, comes from all the other people who also got the vaccine. That’s why measures like social distancing are so important right now: even if you’re not in a high-risk group for covid-19, you should still stay home and avoid in-person socializing because your good behavior lowers the risk for those who are in high-risk groups. If we all take these kinds of measures, the risk lowers dramatically. So stay home, work remotely if you can, avoid physical contact with others, and do your part to manage this crisis. We’re all in this together.

Mar 9, 2020 • 21min

Causal inference when you can't experiment: difference-in-differences and synthetic controls

When you need to untangle cause and effect, but you can’t run an experiment, it’s time to get creative. This episode covers difference in differences and synthetic controls, two observational causal inference techniques that researchers have used to understand causality in complex real-world situations.

Mar 2, 2020 • 32min

Building a curriculum for educating data scientists: Interview with Prof. Xiao-Li Meng

Professor Xiao-Li Meng discusses designing data science curricula, addressing the heterogeneity of the field. He explores the importance of data quality, practical experience, and balancing data privacy. The conversation covers reshaping statistics PhD programs, staying current in the field, and the Harvard Data Science Review as a valuable resource.

Jan 27, 2020 • 25min

Running experiments when there are network effects

Traditional A/B tests assume that whether or not one person got a treatment has no effect on the experiment outcome for another person. But that’s not a safe assumption, especially when there are network effects (like in almost any social context, for instance!) SUTVA, or the stable treatment unit value assumption, is a big phrase for this assumption and violations of SUTVA make for some pretty interesting experiment designs. From news feeds in LinkedIn to disentangling herd immunity from individual immunity in vaccine studies, indirect (i.e. network) effects in experiments can be just as big as, or even bigger than, direct (i.e. individual effects). And this is what we talk about this week on the podcast. Relevant links: http://hanj.cs.illinois.edu/pdf/www15_hgui.pdf https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2600548/pdf/nihms-73860.pdf

Jan 20, 2020 • 23min

Zeroing in on what makes adversarial examples possible

Adversarial examples are really, really weird: pictures of penguins that get classified with high certainty by machine learning algorithms as drumsets, or random noise labeled as pandas, or any one of an infinite number of mistakes in labeling data that humans would never make but computers make with joyous abandon. What gives? A compelling new argument makes the case that it’s not the algorithms so much as the features in the datasets that holds the clue. This week’s episode goes through several papers pushing our collective understanding of adversarial examples, and giving us clues to what makes these counterintuitive cases possible. Relevant links: https://arxiv.org/pdf/1905.02175.pdf https://arxiv.org/pdf/1805.12152.pdf https://distill.pub/2019/advex-bugs-discussion/ https://arxiv.org/pdf/1911.02508.pdf

Jan 13, 2020 • 30min

Unsupervised Dimensionality Reduction: UMAP vs t-SNE

Dimensionality reduction redux: this episode covers UMAP, an unsupervised algorithm designed to make high-dimensional data easier to visualize, cluster, etc. It’s similar to t-SNE but has some advantages. This episode gives a quick recap of t-SNE, especially the connection it shares with information theory, then gets into how UMAP is different (many say better). Between the time we recorded and released this episode, an interesting argument made the rounds on the internet that UMAP’s advantages largely stem from good initialization, not from advantages inherent in the algorithm. We don’t cover that argument here obviously, because it wasn’t out there when we were recording, but you can find a link to the paper below. Relevant links: https://pair-code.github.io/understanding-umap/ https://www.biorxiv.org/content/10.1101/2019.12.19.877522v1

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app