Linear Digressions

Ben Jaffe and Katie Malone
undefined
Aug 28, 2017 • 16min

Adversarial Examples

Even as we rely more and more on machine learning algorithms to help with everyday decision-making, we're learning more and more about how they're frighteningly easy to fool sometimes. Today we have a roundup of a few successful efforts to create robust adversarial examples, including what it means for an adversarial example to be robust and what this might mean for machine learning in the future.
undefined
Aug 21, 2017 • 16min

Jupyter Notebooks

This week's episode is just in time for JupyterCon in NYC, August 22-25... Jupyter notebooks are probably familiar to a lot of data nerds out there as a great open-source tool for exploring data, doing quick visualizations, and packaging code snippets with explanations for sharing your work with others. If you're not a data person, or you are but you haven't tried out Jupyter notebooks yet, here's your nudge to go give them a try. In this episode we'll go back to the old days, before notebooks, and talk about all the ways that data scientists like to work that wasn't particularly well-suited to the command line + text editor setup, and talk about how notebooks have evolved over their lifetime to become even more powerful and well-suited to the data scientist's workflow.
undefined
Aug 14, 2017 • 19min

Curing Cancer with Machine Learning is Super Hard

Today, a dispatch on what can go wrong when machine learning hype outpaces reality: a high-profile partnership between IBM Watson and MD Anderson Cancer Center has recently hit the rocks as it turns out to be tougher than expected to cure cancer with artificial intelligence.  There are enough conflicting accounts in the media to make it tough to say exactly went wrong, but it's a good chance to remind ourselves that even in a post-AI world, hard problems remain hard.
undefined
Aug 7, 2017 • 26min

KL Divergence

Kullback Leibler divergence, or KL divergence, is a measure of information loss when you try to approximate one distribution with another distribution.  It comes to us originally from information theory, but today underpins other, more machine-learning-focused algorithms like t-SNE.  And boy oh boy can it be tough to explain.  But we're trying our hardest in this episode!
undefined
Jul 31, 2017 • 26min

Sabermetrics

It's moneyball time! SABR (the Society for American Baseball Research) is the world's largest organization of statistics-minded baseball enthusiasts, who are constantly applying the craft of scientific analysis to trying to figure out who are the best baseball teams and players. It can be hard to objectively measure sports greatness, but baseball has a data-rich history and plenty of nerdy fans interested in analyzing that data. In this episode we'll dissect a few of the metrics from standard baseball and compare them to related metrics from Sabermetrics, so you can nerd out more effectively at your next baseball game.
undefined
Jul 24, 2017 • 24min

What Data Scientists Can Learn from Software Engineers

We're back again with friend of the pod Walt, former software engineer extraordinaire and current data scientist extraordinaire, to talk about some best practices from software engineering that are ready to jump the fence over to data science.  If last week's episode was for software engineers who are interested in becoming more like data scientists, then this week's episode is for data scientists who are looking to improve their game with best practices from software engineering.
undefined
Jul 17, 2017 • 19min

Software Engineering to Data Science

Data scientists and software engineers often work side by side, building out and scaling technical products and services that are data-heavy but also require a lot of software engineering to build and maintain.  In this episode, we'll chat with a Friend of the Pod named Walt, who started out as a software engineer but works as a data scientist now.  We'll talk about that transition from software engineering to data science, and what special capabilities software engineers have that data scientists might benefit from knowing about (and vice versa).
undefined
Jul 10, 2017 • 12min

Re-Release: Fighting Cholera with Data, 1854

This episode was first released in November 2014. In the 1850s, there were a lot of things we didn’t know yet: how to create an airplane, how to split an atom, or how to control the spread of a common but deadly disease: cholera. When a cholera outbreak in London killed scores of people, a doctor named John Snow used it as a chance to study whether the cause might be very small organisms that were spreading through the water supply (the prevailing theory at the time was miasma, or “bad air”). By tracing the geography of all the deaths from the outbreak, Snow was practicing elementary data science--and stumbled upon one of history’s most famous outliers. In this episode, we’ll tell you more about this single data point, a case of cholera that cracked the case wide open for Snow and provided critical validation for the germ theory of disease.
undefined
Jul 2, 2017 • 32min

Re-Release: Data Mining Enron

This episode was first release in February 2015. In 2000, Enron was one of the largest and companies in the world, praised far and wide for its innovations in energy distribution and many other markets. By 2002, it was apparent that many bad apples had been cooking the books, and billions of dollars and thousands of jobs disappeared. In the aftermath, surprisingly, one of the greatest datasets in all of machine learning was born--the Enron emails corpus. Hundreds of thousands of emails amongst top executives were made public; there's no realistic chance any dataset like this will ever be made public again. But the dataset that was released has gone on to immortality, serving as the basis for a huge variety of advances in machine learning and other fields.
undefined
Jun 26, 2017 • 20min

Factorization Machines

What do you get when you cross a support vector machine with matrix factorization? You get a factorization machine, and a darn fine algorithm for recommendation engines.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app