Linear Digressions
Ben Jaffe and Katie Malone
Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.
Episodes
Mentioned books
Mar 4, 2015 • 9min
Random Kanye
Ever feel like you could randomly assemble words from a certain vocabulary and make semi-coherent Kanye West lyrics? Or technical documentation, imitations of local newscasters, your politically outspoken uncle, etc.? Wonder no more, there's a way to do this exact type of thing: it's called a Markov Chain, and probably the most powerful way to generate made-up data that you can then use for fun and profit. The idea behind a Markov Chain is that you probabilistically generate a sequence of steps, numbers, words, etc. where each next step/number/word depends only on the previous one, which makes it fast and efficient to computationally generate. Usually Markov Chains are used for serious academic uses, but this ain't one of them: here they're used to randomly generate rap lyrics based on Kanye West lyrics.
Feb 25, 2015 • 9min
Lie Detectors
Often machine learning discussions center around algorithms, or features, or datasets--this one centers around interpretation, and ethics.
Suppose you could use a technology like fMRI to see what regions of a person's brain are active when they ask questions. And also suppose that you could run trials where you watch their brain activity while they lie about some minor issue (say, whether the card in their hand is a spade or a club)--could you use machine learning to analyze those images, and use the patterns in them for lie detection? Well you certainly can try, and indeed researchers have done just that.
There are important problems though--the images of brains can be high variance, meaning that for any given person, there might not be a lot of certainty about whether they're lying or not. It's also open to debate whether the training set (in this case, test subjects with playing cards in their hands) really generalize well to the more important cases, like a person accused of a crime.
So while machine learning has yielded some impressive gains in lie detection, it is not a solution to these thornier scientific issues.
http://www.amacad.org/pdfs/deceit.pdf
8 snips
Feb 9, 2015 • 12min
The Enron Dataset
The podcast discusses the Enron emails corpus, a dataset used in machine learning, and its significance. It explores privacy concerns, algorithm development, data cleaning, and the uses of the Enron dataset in studying corporate fraud.
6 snips
Feb 4, 2015 • 13min
Labels and Where To Find Them
Explore the challenges of gathering labeled data for machine learning examples including lie detection using brain images and automated image captioning. Discover the difficulties in obtaining accurate labeling through platforms like Amazon's Mechanical Turk. Learn about the significance of semantic analysis in natural language processing and the value of labeled data in data science. Discover innovative strategies for obtaining labeled data and the scientific value of human contribution in projects like Galaxy Zoo and Higgshunters.org.
Jan 23, 2015 • 13min
Um Detector 1
The hosts discuss their experience editing the podcast and propose developing an 'um detector'. They dive into sound representation, decoding audio files and variable lengths in machine learning. They also discuss the process of developing a machine learning algorithm to detect specific signals in audio data, including encoding files into integers and overcoming roadblocks.
5 snips
Jan 7, 2015 • 12min
Better Facial Recognition with Fisherfaces
The podcast explores the challenges of facial recognition and how it can be affected by variations in expressions, lighting, and angle. The Fisherfaces algorithm is introduced as a more robust alternative to eigenfaces, utilizing a fisher linear discriminant to distinguish based on the smallest inter-class distance. The chapter descriptions discuss the exploration and analysis of facial features, the comparison between Fisher Linear Discriminate and Eigenfaces, and the superior performance of Fisher Faces in challenging situations.
7 snips
Jan 7, 2015 • 10min
Facial Recognition with Eigenfaces
Facial recognition in machine learning is a challenging task due to the high dimensionality of pictures. Principal Component Analysis (PCA) helps reduce dimensions and identify important patterns. Eigenfaces, obtained through PCA, are composite features that represent reconstructions of faces. They play a crucial role in successful facial recognition algorithms, despite their ghostly appearance and challenges in interpretation.
Dec 17, 2014 • 13min
Stats of World Series Streaks
The podcast explores the significance of the number of games in the World Series and how teams in baseball have lower winning percentages compared to other sports. The hosts discuss the assumptions of the binomial distribution and test them by comparing predicted and actual distributions of World Series games. They analyze the percentages of different game lengths and speculate on the influence of fan emotions. The podcast also delves into the mysterious nature of World Series streaks and their intrigue.
Nov 26, 2014 • 9min
Computers Try to Tell Jokes
Computers are capable of many impressive feats, but making you laugh is usually not one of them. Or could it be? This episode will talk about a custom-built machine learning algorithm that searches through text and writes jokes based on what it finds.
The jokes are formulaic: they're all of the form "I like my X like I like my Y: Z" where X and Y are nouns, and Z is an adjective that can describe both X and Y. For (dumb) example, "I like my men like I like my coffee: steaming hot." The joke is funny when ZX and ZY are both very common phrases, but X and Y are rarely seen together.
So, given a large enough corpus of text, the algorithm looks for triplets of words that fit this description and writes jokes based on them. Are the jokes funny? You be the judge...
http://homepages.inf.ed.ac.uk/s0894589/petrovic13unsupervised.pdf

7 snips
Nov 22, 2014 • 11min
How Outliers Helped Defeat Cholera
In the 1850s, Dr. John Snow solved the mystery of a cholera outbreak in London by tracing the geography of deaths and discovering an outlier. His findings validated the germ theory of disease and revolutionized our understanding of disease transmission.


