Linear Digressions
Ben Jaffe and Katie Malone
Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.
Episodes
Mentioned books
Jan 15, 2018 • 18min
Challenges with Using Machine Learning to Classify Chest X-Rays
Another installment in our "machine learning might not be a silver bullet for solving medical problems" series. This week, we have a high-profile blog post that has been making the rounds for the last few weeks, in which a neural network trained to visually recognize various diseases in chest x-rays is called into question by a radiologist with machine learning expertise. As it seemingly always does, it comes down to the dataset that's used for training--medical records assume a lot of context that may or may not be available to the algorithm, so it's tough to make something that actually helps (in this case) predict disease that wasn't already diagnosed.
Jan 8, 2018 • 16min
The Fourier Transform
The Fourier transform is one of the handiest tools in signal processing for dealing with periodic time series data. Using a Fourier transform, you can break apart a complex periodic function into a bunch of sine and cosine waves, and figure out what the amplitude, frequency and offset of those component waves are. It's a really handy way of re-expressing periodic data--you'll never look at a time series graph the same way again.
Jan 2, 2018 • 15min
Statistics of Beer
What better way to kick off a new year than with an episode on the statistics of brewing beer?
Dec 24, 2017 • 10min
Re - Release: Random Kanye
We have a throwback episode for you today as we take the week off to enjoy the holidays. This week: what happens when you have a markov chain that generates mashup Kanye West lyrics with Bible verses? Exactly what you think.
Dec 18, 2017 • 18min
Debiasing Word Embeddings
When we covered the Word2Vec algorithm for embedding words, we mentioned parenthetically that the word embeddings it produces can sometimes be a little bit less than ideal--in particular, gender bias from our society can creep into the embeddings and give results that are sexist. For example, occupational words like "doctor" and "nurse" are more highly aligned with "man" or "woman," which can create problems because these word embeddings are used in algorithms that help people find information or make decisions. However, a group of researchers has released a new paper detailing ways to de-bias the embeddings, so we retain gender info that's not particularly problematic (for example, "king" vs. "queen") while correcting bias.
Dec 11, 2017 • 18min
The Kernel Trick and Support Vector Machines
Picking up after last week's episode about maximal margin classifiers, this week we'll go into the kernel trick and how that (combined with maximal margin algorithms) gives us the much-vaunted support vector machine.
Dec 4, 2017 • 14min
Maximal Margin Classifiers
Maximal margin classifiers are a way of thinking about supervised learning entirely in terms of the decision boundary between two classes, and defining that boundary in a way that maximizes the distance from any given point to the boundary. It's a neat way to think about statistical learning and a prerequisite for understanding support vector machines, which we'll cover next week--stay tuned!
Nov 27, 2017 • 14min
Re - Release: The Cocktail Party Problem
Grab a cocktail, put on your favorite karaoke track, and let’s talk some more about disentangling audio data!
Nov 20, 2017 • 16min
Clustering with DBSCAN
DBSCAN is a density-based clustering algorithm for doing unsupervised learning. It's pretty nifty: with just two parameters, you can specify "dense" regions in your data, and grow those regions out organically to find clusters. In particular, it can fit irregularly-shaped clusters, and it can also identify outlier points that don't belong to any of the clusters. Pretty cool!
Nov 13, 2017 • 25min
The Kaggle Survey on Data Science
Want to know what's going on in data science these days? There's no better way than to analyze a survey with over 16,000 responses that recently released by Kaggle. Kaggle asked practicing and aspiring data scientists about themselves, their tools, how they find jobs, what they find challenging about their jobs, and many other questions. Then Kaggle released an interactive summary of the data, as well as the anonymized dataset itself, to help data scientists understand the trends in the data. In this episode, we'll go through some of the survey toplines that we found most interesting and counterintuitive.


