Linear Digressions

Ben Jaffe and Katie Malone
undefined
Jan 20, 2019 • 17min

Heterogeneous Treatment Effects

When data scientists use a linear regression to look for causal relationships between a treatment and an outcome, what they’re usually finding is the so-called average treatment effect. In other words, on average, here’s what the treatment does in terms of making a certain outcome more or less likely to happen. But there’s more to life than averages: sometimes the relationship works one way in some cases, and another way in other cases, such that the average isn’t giving you the whole story. In that case, you want to start thinking about heterogeneous treatment effects, and this is the podcast episode for you. Relevant links: https://eng.uber.com/analyzing-experiment-outcomes/ https://multithreaded.stitchfix.com/blog/2018/11/08/bandits/ https://www.locallyoptimistic.com/post/against-ab-tests/
undefined
Jan 14, 2019 • 28min

Pre-training language models for natural language processing problems

When you build a model for natural language processing (NLP), such as a recurrent neural network, it helps a ton if you’re not starting from zero. In other words, if you can draw upon other datasets for building your understanding of word meanings, and then use your training dataset just for subject-specific refinements, you’ll get farther than just using your training dataset for everything. This idea of starting with some pre-trained resources has an analogue in computer vision, where initializations from ImageNet used for the first few layers of a CNN have become the new standard. There’s a similar progression under way in NLP, where simple(r) embeddings like word2vec are giving way to more advanced pre-processing methods that aim to capture more sophisticated understanding of word meanings, contexts, language structure, and more. Relevant links: https://thegradient.pub/nlp-imagenet/
undefined
Jan 7, 2019 • 43min

Facial Recognition, Society, and the Law

Facial recognition being used in everyday life seemed far-off not too long ago. Increasingly, it’s being used and advanced widely and with increasing speed, which means that our technical capabilities are starting to outpace (if they haven’t already) our consensus as a society about what is acceptable in facial recognition and what isn’t. The threats to privacy, fairness, and freedom are real, and Microsoft has become one of the first large companies using this technology to speak out in specific support of its regulation through legislation. Their arguments are interesting, provocative, and even if you don’t agree with every point they make or harbor some skepticism, there’s a lot to think about in what they’re saying. https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/
undefined
Dec 31, 2018 • 18min

Re-release: Word2Vec

Bringing you another old classic this week, as we gear up for 2019! See you next week with new content. Word2Vec is probably the go-to algorithm for vectorizing text data these days.  Which makes sense, because it is wicked cool.  Word2Vec has it all: neural networks, skip-grams and bag-of-words implementations, a multiclass classifier that gets swapped out for a binary classifier, made-up dummy words, and a model that isn't actually used to predict anything (usually).  And all that's before we get to the part about how Word2Vec allows you to do algebra with text.  Seriously, this stuff is cool.
undefined
Dec 23, 2018 • 16min

Re - Release: The Cold Start Problem

We’re taking a break for the holidays, chilling with the dog and an eggnog (Katie) and the cat and some spiced cider (Ben). Here’s an episode from a while back for you to enjoy. See you again in 2019! You might sometimes find that it's hard to get started doing something, but once you're going, it gets easier. Turns out machine learning algorithms, and especially recommendation engines, feel the same way. The more they "know" about a user, like what movies they watch and how they rate them, the better they do at suggesting new movies, which is great until you realize that you have to start somewhere. The "cold start" problem will be our focus in this episode, both the heuristic solutions that help deal with it and a bit of realism about the importance of skepticism when someone claims a great solution to cold starts.
undefined
Dec 17, 2018 • 20min

Convex (and non-convex) Optimization

Convex optimization is one of the keys to data science, both because some problems straight-up call for optimization solutions and because popular algorithms like a gradient descent solution to ordinary least squares are supported by optimization techniques. But there are all kinds of subtleties, starting with convex and non-convex functions, why gradient descent is really an optimization problem, and what that means for your average data scientist or statistician.
undefined
Dec 9, 2018 • 27min

The Normal Distribution and the Central Limit Theorem

When you think about it, it’s pretty amazing that we can draw conclusions about huge populations, even the whole world, based on datasets that are comparatively very small (a few thousand, or a few hundred, or even sometimes a few dozen). That’s the power of statistics, though. This episode is kind of a two-for-one but we’re excited about it—first we’ll talk about the Normal or Gaussian distribution, which is maybe the most famous probability distribution function out there, and then turn to the Central Limit Theorem, which is one of the foundational tenets of statistics and the real reason why the Normal distribution is so important.
undefined
Dec 2, 2018 • 17min

Software 2.0

Neural nets are a way you can model a system, sure, but if you take a step back, squint, and tilt your head, they can also be called… software? Not in the sense that they’re written in code, but in the sense that the neural net itself operates under the same set of general requirements as does software that a human would write. Namely, neural nets take inputs and create outputs from them according to a set of rules, but the thing about the inside of the neural net black box is that it’s written by a computer, whereas the software we’re more familiar with is written by a human. Neural net researcher and Tesla director of AI Andrej Karpathy has taken to calling neural nets “Software 2.0” as a result, and the implications from this connection are really cool. We’ll talk about it this week. Relevant links: https://medium.com/@karpathy/software-2-0-a64152b37c35
undefined
Nov 18, 2018 • 27min

Limitations of Deep Nets for Computer Vision

Deep neural nets have a deserved reputation as the best-in-breed solution for computer vision problems. But there are many aspects of human vision that we take for granted but where neural nets struggle—this episode covers an eye-opening paper that summarizes some of the interesting weak spots of deep neural nets. Relevant links: https://arxiv.org/abs/1805.04025
undefined
Nov 12, 2018 • 25min

Building Data Science Teams

At many places, data scientists don’t work solo anymore—it’s a team sport. But data science teams aren’t simply teams of data scientists working together. Instead, they’re usually cross-functional teams with engineers, managers, data scientists, and sometimes others all working together to build tools and products around data science. This episode talks about some of those roles on a typical data science team, what the responsibilities are for each role, and what skills and traits are most important for each team member to have.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app