Linear Digressions
Ben Jaffe and Katie Malone
Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.
Episodes
Mentioned books
4 snips
Aug 19, 2018 • 26min
Agile Development for Data Scientists, Part 1: The Good
Discover how Agile principles in software development can enhance data science workflows. The discussion highlights the iterative nature of Agile, showcasing its advantages over traditional methodologies like Waterfall. Learn about adaptive planning and the importance of setting clear goals to ensure project success. The speakers also delve into engaging stakeholders for better decision-making and the unique estimation technique of story pointing, using the Fibonacci sequence to navigate task complexity. Tune in for insights that can revolutionize your data science projects!
Aug 13, 2018 • 18min
Re - Release: How To Lose At Kaggle
We've got a classic for you this week as we take a week off for the dog days of summer. See you again next week!
Competing in a machine learning competition on Kaggle is a kind of rite of passage for data scientists. Losing unexpectedly at the very end of the contest is also something that a lot of us have experienced. It's not just bad luck: a very specific combination of overfitting on popular competitions can take someone who is in the top few spots in the final days of a contest and bump them down hundreds of slots in the final tally.
Aug 6, 2018 • 30min
Troubling Trends In Machine Learning Scholarship
There's a lot of great machine learning papers coming out every day--and, if we're being honest, some papers that are not as great as we'd wish. In some ways this is symptomatic of a field that's growing really quickly, but it's also an artifact of strange incentive structures in academic machine learning, and the fact that sometimes machine learning is just really hard. At the same time, a high quality of academic work is critical for maintaining the reputation of the field, so in this episode we walk through a recent paper that spells out some of the most common shortcomings of academic machine learning papers and what we can do to make things better.
Relevant links:
https://arxiv.org/abs/1807.03341
Jul 29, 2018 • 29min
Can Fancy Running Shoes Cause You To Run Faster?
The stars aligned for me (Katie) this past weekend: I raced my first half-marathon in a long time and got to read a great article from the NY Times about a new running shoe that Nike claims can make its wearers run faster. Causal claims like this one are really tough to verify, because even if the data suggests that people wearing the shoe are faster that might be because of correlation, not causation, so I loved reading this article that went through an analysis of thousands of runners' data in 4 different ways. Each way has a great explanation with pros and cons (as well as results, of course), so be sure to read the article after you check out this episode!
Relevant links:
https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
Jul 22, 2018 • 23min
Compliance Bias
When you're using an AB test to understand the effect of a treatment, there are a lot of assumptions about how the treatment (and control, for that matter) get applied. For example, it's easy to think that everyone who was assigned to the treatment arm actually gets the treatment, everyone in the control arm doesn't, and that the two groups get their treatment instantaneously. None of these things happen in real life, and if you really care about measuring your treatment effect then that's something you want to understand and correct. In this post we'll talk through a great blog post that outlines this for mobile experiments. Oh, and Ben sings.
Jul 15, 2018 • 19min
AI Winter
Artificial Intelligence has been widely lauded as a solution to almost any problem. But as we justapose the hype in the field against the real-world benefits we see, it raises the question: Are we coming up on an AI winter
Jul 8, 2018 • 19min
Rerelease: How to Find New Things to Learn
We like learning on vacation. And we're on vacation, so we thought we'd re-air this episode about how to learn.
Original Episode: https://lineardigressions.com/episodes/2017/5/14/how-to-find-new-things-to-learn
Original Summary: If you're anything like us, you a) always are curious to learn more about data science and machine learning and stuff, and b) are usually overwhelmed by how much content is out there (not all of it very digestible). We hope this podcast is a part of the solution for you, but if you're looking to go farther (who isn't?) then we have a few new resources that are presenting high-quality content in a fresh, accessible way. Boring old PDFs full of inscrutable math notation, your days are numbered!
Jul 2, 2018 • 25min
Rerelease: Space Codes
We're on vacation on Mars, so we won't be communicating with you all directly this week. Though, if we wanted to, we could probably use this episode to help get started.
Original Episode: http://lineardigressions.com/episodes/2017/3/19/space-codes
Original Summary: It's hard to get information to and from Mars. Mars is very far away, and expensive to get to, and the bandwidth for passing messages with Earth is not huge. The messages you do pass have to traverse millions of miles, which provides ample opportunity for the message to get corrupted or scrambled. How, then, can you encode messages so that errors can be detected and corrected? How does the decoding process allow you to actually find and correct the errors? In this episode, we'll talk about three pieces of the process (Reed-Solomon codes, convolutional codes, and Viterbi decoding) that allow the scientists at NASA to talk to our rovers on Mars.
Jun 25, 2018 • 16min
Rerelease: Anscombe's Quartet
We're on vacation, so we hope you enjoy this episode while we each sip cocktails on the beach.
Original Episode: http://lineardigressions.com/episodes/2017/6/18/anscombes-quartet
Original Summary: Anscombe's Quartet is a set of four datasets that have the same mean, variance and correlation but look very different. It's easy to think that having a good set of summary statistics (like mean, variance and correlation) can tell you everything important about a dataset, or at least enough to know if two datasets are extremely similar or extremely different, but Anscombe's Quartet will always be standing behind you, laughing at how silly that idea is.
Anscombe's Quartet was devised in 1973 as an example of how summary statistics can be misleading, but today we can even do one better: the Datasaurus Dozen is a set of twelve datasets, all extremely visually distinct, that have the same summary stats as a source dataset that, there's no other way to put this, looks like a dinosaur. It's an example of how datasets can be generated to look like almost anything while still preserving arbitrary summary statistics. In other words, Anscombe's Quartets can be generated at-will and we all should be reminded to visualize our data (not just compute summary statistics) if we want to claim to really understand it.
Jun 18, 2018 • 28min
Rerelease: Hurricanes Produced
Now that hurricane season is upon us again (and we are on vacation), we thought a look back on our hurricane forecasting episode was prudent. Stay safe out there.


