Linear Digressions

Ben Jaffe and Katie Malone
undefined
4 snips
Aug 19, 2018 • 26min

Agile Development for Data Scientists, Part 1: The Good

Discover how Agile principles in software development can enhance data science workflows. The discussion highlights the iterative nature of Agile, showcasing its advantages over traditional methodologies like Waterfall. Learn about adaptive planning and the importance of setting clear goals to ensure project success. The speakers also delve into engaging stakeholders for better decision-making and the unique estimation technique of story pointing, using the Fibonacci sequence to navigate task complexity. Tune in for insights that can revolutionize your data science projects!
undefined
Aug 13, 2018 • 18min

Re - Release: How To Lose At Kaggle

We've got a classic for you this week as we take a week off for the dog days of summer. See you again next week! Competing in a machine learning competition on Kaggle is a kind of rite of passage for data scientists. Losing unexpectedly at the very end of the contest is also something that a lot of us have experienced. It's not just bad luck: a very specific combination of overfitting on popular competitions can take someone who is in the top few spots in the final days of a contest and bump them down hundreds of slots in the final tally.
undefined
Aug 6, 2018 • 30min

Troubling Trends In Machine Learning Scholarship

There's a lot of great machine learning papers coming out every day--and, if we're being honest, some papers that are not as great as we'd wish. In some ways this is symptomatic of a field that's growing really quickly, but it's also an artifact of strange incentive structures in academic machine learning, and the fact that sometimes machine learning is just really hard. At the same time, a high quality of academic work is critical for maintaining the reputation of the field, so in this episode we walk through a recent paper that spells out some of the most common shortcomings of academic machine learning papers and what we can do to make things better. Relevant links: https://arxiv.org/abs/1807.03341
undefined
Jul 29, 2018 • 29min

Can Fancy Running Shoes Cause You To Run Faster?

The stars aligned for me (Katie) this past weekend: I raced my first half-marathon in a long time and got to read a great article from the NY Times about a new running shoe that Nike claims can make its wearers run faster. Causal claims like this one are really tough to verify, because even if the data suggests that people wearing the shoe are faster that might be because of correlation, not causation, so I loved reading this article that went through an analysis of thousands of runners' data in 4 different ways. Each way has a great explanation with pros and cons (as well as results, of course), so be sure to read the article after you check out this episode! Relevant links: https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html
undefined
Jul 22, 2018 • 23min

Compliance Bias

When you're using an AB test to understand the effect of a treatment, there are a lot of assumptions about how the treatment (and control, for that matter) get applied. For example, it's easy to think that everyone who was assigned to the treatment arm actually gets the treatment, everyone in the control arm doesn't, and that the two groups get their treatment instantaneously. None of these things happen in real life, and if you really care about measuring your treatment effect then that's something you want to understand and correct. In this post we'll talk through a great blog post that outlines this for mobile experiments. Oh, and Ben sings.
undefined
Jul 15, 2018 • 19min

AI Winter

Artificial Intelligence has been widely lauded as a solution to almost any problem. But as we justapose the hype in the field against the real-world benefits we see, it raises the question: Are we coming up on an AI winter
undefined
Jul 8, 2018 • 19min

Rerelease: How to Find New Things to Learn

We like learning on vacation. And we're on vacation, so we thought we'd re-air this episode about how to learn. Original Episode: https://lineardigressions.com/episodes/2017/5/14/how-to-find-new-things-to-learn Original Summary: If you're anything like us, you a) always are curious to learn more about data science and machine learning and stuff, and b) are usually overwhelmed by how much content is out there (not all of it very digestible).  We hope this podcast is a part of the solution for you, but if you're looking to go farther (who isn't?) then we have a few new resources that are presenting high-quality content in a fresh, accessible way.  Boring old PDFs full of inscrutable math notation, your days are numbered!
undefined
Jul 2, 2018 • 25min

Rerelease: Space Codes

We're on vacation on Mars, so we won't be communicating with you all directly this week. Though, if we wanted to, we could probably use this episode to help get started. Original Episode: http://lineardigressions.com/episodes/2017/3/19/space-codes Original Summary: It's hard to get information to and from Mars.  Mars is very far away, and expensive to get to, and the bandwidth for passing messages with Earth is not huge.  The messages you do pass have to traverse millions of miles, which provides ample opportunity for the message to get corrupted or scrambled.  How, then, can you encode messages so that errors can be detected and corrected?  How does the decoding process allow you to actually find and correct the errors?  In this episode, we'll talk about three pieces of the process (Reed-Solomon codes, convolutional codes, and Viterbi decoding) that allow the scientists at NASA to talk to our rovers on Mars.
undefined
Jun 25, 2018 • 16min

Rerelease: Anscombe's Quartet

We're on vacation, so we hope you enjoy this episode while we each sip cocktails on the beach. Original Episode: http://lineardigressions.com/episodes/2017/6/18/anscombes-quartet Original Summary: Anscombe's Quartet is a set of four datasets that have the same mean, variance and correlation but look very different.  It's easy to think that having a good set of summary statistics (like mean, variance and correlation) can tell you everything important about a dataset, or at least enough to know if two datasets are extremely similar or extremely different, but Anscombe's Quartet will always be standing behind you, laughing at how silly that idea is. Anscombe's Quartet was devised in 1973 as an example of how summary statistics can be misleading, but today we can even do one better: the Datasaurus Dozen is a set of twelve datasets, all extremely visually distinct, that have the same summary stats as a source dataset that, there's no other way to put this, looks like a dinosaur.  It's an example of how datasets can be generated to look like almost anything while still preserving arbitrary summary statistics.  In other words, Anscombe's Quartets can be generated at-will and we all should be reminded to visualize our data (not just compute summary statistics) if we want to claim to really understand it.
undefined
Jun 18, 2018 • 28min

Rerelease: Hurricanes Produced

Now that hurricane season is upon us again (and we are on vacation), we thought a look back on our hurricane forecasting episode was prudent. Stay safe out there.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app