Linear Digressions
Ben Jaffe and Katie Malone
Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.
Episodes
Mentioned books
Aug 25, 2015 • 16min
Watson
This machine learning algorithm beat the human champions at Jeopardy. What is... Watson?
Aug 18, 2015 • 12min
Bayesian Psychics
Come get a little "out there" with us this week, as we use a meta-study of extrasensory perception (or ESP, often used in the same sentence as "psychics") to chat about Bayesian vs. frequentist statistics.
Aug 7, 2015 • 13min
Troll Detection
Ever found yourself wasting time reading online comments from trolls? Of course you have; we've all been there (it's 4 AM but I can't turn off the computer and go to sleep--someone on the internet is WRONG!). Now there's a way to use machine learning to automatically detect trolls, and minimize the impact when they try to derail online conversations.
Aug 3, 2015 • 12min
Yiddish Translation
Imagine a language that is mostly spoken rather than written, contains many words in other languages, and has relatively little written overlap with English. Now imagine writing a machine-learning-based translation system that can convert that language to English. That's the problem that confronted researchers when they set out to automatically translate between Yiddish and English; the tricks they used help us understand a lot about machine translation.
Jul 6, 2015 • 16min
Modeling Particles in Atomic Bombs
In a fun historical journey, Katie and Ben explore the history of the Manhattan Project, discuss the difficulties in modeling particle movement in atomic bombs with only punch-card computers and ingenuity, and eventually come to present-day uses of the Metropolis-Hastings algorithm... mentioning Solitaire along the way.
Jun 19, 2015 • 10min
Random Number Generation
Let's talk about randomness! Although randomness is pervasive throughout the natural world, it's surprisingly difficult to generate random numbers. And even if your numbers look random (but actually aren't), it can have interesting consequences on the security of systems, and the accuracy of models and research.
In this episode, Katie and Ben talk about randomness, its place in machine learning and computation in general, along with some random digressions of their own.
Jun 9, 2015 • 21min
Electoral Insights (Part 2)
Following up on our last episode about how experiments can be performed in political science, now we explore a high-profile case of an experiment gone wrong.
An extremely high-profile paper that was published in 2014, about how talking to people can convince them to change their minds on topics like abortion and gay marriage, has been exposed as the likely product of a fraudulently produced dataset. We’ll talk about a cool data science tool called the Kolmogorov-Smirnov test, which a pair of graduate students used to reverse-engineer the likely way that the fraudulent data was generated.
But a bigger question still remains—what does this whole episode tell us about fraud and oversight in science?
Jun 5, 2015 • 9min
Electoral Insights (Part 1)
The first of our two-parter discussing the recent electoral data fraud case. The results of the study in question were covered widely, including by This American Life (who later had to issue a retraction).
Data science for election research involves studying voters, who are people, and people are tricky to study—every one of them is different, and the same treatment can have different effects on different voters. But with randomized controlled trials, small variations from person to person can even out when you look at a larger group. With the advent of randomized experiments in elections a few decades ago, a whole new door was opened for studying the most effective ways to campaign.
Jun 1, 2015 • 18min
Falsifying Data
In the first of a few episodes on fraud in election research, we’ll take a look at a case study from a previous Presidential election, where polling results were faked.
What are some telltale signs that data fraud might be present in a dataset? We’ll explore that in this episode.
May 20, 2015 • 11min
Reporter Bot
There’s a big difference between a table of numbers or statistics, and the underlying story that a human might tell about how those numbers were generated.
Think about a baseball game—the game stats and a newspaper story are describing the same thing, but one is a good input for a machine learning algorithm and the other is a good story to read over your morning coffee. Data science and machine learning are starting to bridge this gap, taking the raw data on things like baseball games, financial scenarios, etc. and automatically writing human-readable stories that are increasingly indistinguishable from what a human would write.
In this episode, we’ll talk about some examples of auto-generated content—you’ll be amazed at how sophisticated some of these reporter-bots can be. By the way, this summary was written by a human. (Or was it?)


