Linear Digressions
Ben Jaffe and Katie Malone
Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.
Episodes
Mentioned books
Apr 25, 2016 • 17min
Model Interpretation (and Trust Issues)
Machine learning algorithms can be black boxes--inputs go in, outputs come out, and what happens in the middle is anybody's guess. But understanding how a model arrives at an answer is critical for interpreting the model, and for knowing if it's doing something reasonable (one could even say... trustworthy). We'll talk about a new algorithm called LIME that seeks to make any model more understandable and interpretable.
Relevant Links:
http://arxiv.org/abs/1602.04938
https://github.com/marcotcr/lime/tree/master/lime
Apr 18, 2016 • 32min
Updates! Political Science Fraud and AlphaGo
We've got updates for you about topics from past shows! First, the political science scandal of the year 2015 has a new chapter, we'll remind you about the original story and then dive into what has happened since. Then, we've got an update on AlphaGo, and his/her/its much-anticipated match against the human champion of the game Go.
Relevant Links:
https://soundcloud.com/linear-digressions/electoral-insights-part-2
https://soundcloud.com/linear-digressions/go-1
http://www.sciencemag.org/news/2016/04/talking-people-about-gay-and-transgender-issues-can-change-their-prejudices
http://science.sciencemag.org/content/sci/352/6282/220.full.pdf
http://qz.com/639952/googles-ai-won-the-game-go-by-defying-millennia-of-basic-human-instinct/
http://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/
http://www.wired.com/2016/03/sadness-beauty-watching-googles-ai-play-go/
Apr 11, 2016 • 19min
Ecological Inference and Simpson's Paradox
Simpson's paradox is the data science equivalent of looking through one eye and seeing a very clear trend, and then looking through the other eye and seeing the very clear opposite trend. In one case, you see a trend one way in a group, but then breaking the group into subgroups gives the exact opposite trend. Confused? Scratching your head? Welcome to the tricky world of ecological inference.
Relevant links:
https://gking.harvard.edu/files/gking/files/part1.pdf
http://blog.revolutionanalytics.com/2013/07/a-great-example-of-simpsons-paradox.html
Apr 4, 2016 • 15min
Discriminatory Algorithms
Sometimes when we say an algorithm discriminates, we mean it can tell the difference between two types of items. But in this episode, we'll talk about another, more troublesome side to discrimination: algorithms can be... racist? Sexist? Ageist? Yes to all of the above. It's an important thing to be aware of, especially when doing people-centered data science. We'll discuss how and why this happens, and what solutions are out there (or not).
Relevant Links:
http://www.nytimes.com/2015/07/10/upshot/when-algorithms-discriminate.html
http://techcrunch.com/2015/08/02/machine-learning-and-human-bias-an-uneasy-pair/
http://www.sciencefriday.com/segments/why-machines-discriminate-and-how-to-fix-them/
https://medium.com/@geomblog/when-an-algorithm-isn-t-2b9fe01b9bb5#.auxqi5srz
Mar 28, 2016 • 32min
Recommendation Engines and Privacy
This episode started out as a discussion of recommendation engines, like Netflix uses to suggest movies. There's still a lot of that in here. But a related topic, which is both interesting and important, is how to keep data private in the era of large-scale recommendation engines--what mistakes have been made surrounding supposedly anonymized data, how data ends up de-anonymized, and why it matters for you.
Relevant links:
http://www.netflixprize.com/
http://bits.blogs.nytimes.com/2010/03/12/netflix-cancels-contest-plans-and-settles-suit/?_r=0
http://arxiv.org/PS_cache/cs/pdf/0610/0610105v2.pdf
Mar 21, 2016 • 19min
Neural nets play cops and robbers (AKA generative adverserial networks)
One neural net is creating counterfeit bills and passing them off to a second neural net, which is trying to distinguish the real money from the fakes. Result: two neural nets that are better than either one would have been without the competition.
Relevant links:
http://arxiv.org/pdf/1406.2661v1.pdf
http://arxiv.org/pdf/1412.6572v3.pdf
http://soumith.ch/eyescream/
Mar 14, 2016 • 19min
A Data Scientist's View of the Fight against Cancer
In this episode, we're taking many episodes' worth of insights and unpacking an extremely complex and important question--in what ways are we winning the fight against cancer, where might that fight go in the coming decade, and how do we know when we're making progress? No matter how tricky you might think this problem is to solve, the fact is, once you get in there trying to solve it, it's even trickier than you thought.
Mar 11, 2016 • 21min
Congress Bots and DeepDrumpf
Hey, sick of the election yet? Fear not, there are algorithms that can automagically generate political-ish speech so that we never need to be without an endless supply of Congressional speeches and Donald Trump twitticisms!
Relevant links:
http://arxiv.org/pdf/1601.03313v2.pdf
http://qz.com/631497/mit-built-a-donald-trump-ai-twitter-bot-that-sounds-scarily-like-him/
https://twitter.com/deepdrumpf
Mar 7, 2016 • 11min
Multi - Armed Bandits
Multi-armed bandits: how to take your randomized experiment and make it harder better faster stronger. Basically, a multi-armed bandit experiment allows you to optimize for both learning and making use of your knowledge at the same time. It's what the pros (like Google Analytics) use, and it's got a great name, so... winner!
Relevant link: https://support.google.com/analytics/answer/2844870?hl=en
Mar 4, 2016 • 17min
Experiments and Messy, Tricky Causality
"People with a family history of heart disease are more likely to eat healthy foods, and have a high incidence of heart attacks." Did the healthy food cause the heart attacks? Probably not. But establishing causal links is extremely tricky, and extremely important to get right if you're trying to help students, test new medicines, or just optimize a website. In this episode, we'll unpack randomized experiments, like AB tests, and maybe you'll be smarter as a result. Will you be smarter BECAUSE of this episode? Well, tough to say for sure...
Relevant link:
http://tylervigen.com/spurious-correlations


