
Towards Data Science
Note: The TDS podcast's current run has ended.
Researchers and business leaders at the forefront of the field unpack the most pressing questions around data science and AI.
Latest episodes

Apr 22, 2020 • 43min
30. Interviewing the Medium data science team
Revenues drop unexpectedly, and management pulls aside the data science team into a room. The team is given its marching orders: “your job,” they’re told, “is to find out what the hell is going on with our purchase orders.”
That’s a very open-ended question, of course, because revenues and signups could drop for any number of reasons. Prices may have increased. A new user interface might be confusing potential customers. Seasonality effects might have to be considered. The source of the problem could be, well, anything.
That’s often the position data scientists find themselves in: rather than having a clear A/B test to analyze, they frequently are in the business of combing through user funnels to ensure that each stage is working as expected.
It takes a very detail-oriented and business-savvy team to pull off an investigation with that broad a scope, but that’s exactly what Medium has: a group of product-minded data scientists dedicated to investigating anomalies and identifying growth opportunities hidden in heaps of user data. They were kind enough to chat with me and talk about how Medium does data science for this episode of the Towards Data Science podcast.

Apr 15, 2020 • 40min
29. Cameron Davidson-Pillon - Data science at Shopify
If you want to know where data science is heading, it helps to know where it’s been. Very few people have that kind of historical perspective, and even fewer combine it with an understanding of cutting-edge tooling that hints at the direction the field might be taking in the future.
Luckily for us, one of them is Cameron Davidson-Pillon, the former Director of Data Science at Shopify. Cameron has been knee-deep in data science and estimation theory since 2012, when the space was still coming into its own. He’s got a great high-level perspective not only on technical issues but also on hiring and team-building, and he was kind enough to join us for today’s episode of the Towards Data Science podcast.

Apr 7, 2020 • 44min
28. Emily Robinson - Building a Career in Data Science
It’s easy to think of data science as a purely technical discipline: after all, it exists at the intersection of a number of genuinely technical topics, from statistics to programming to machine learning.
But there’s much more to data science and analytics than solving technical problems — and there’s much more to the data science job search than coding challenges and Kaggle competitions as well. Landing a job or a promotion as a data scientist calls on a ton of career skills and soft skills that many people don’t spend nearly enough time honing.
On this episode of the podcast, I spoke with Emily Robinson, an experienced data scientist and blogger with a pedigree that includes Etsy and DataCamp, about career-building strategies. Emily’s got a lot to say about the topic, particularly since she just finished authoring a book entitled “Build a Career in Data Science” with her co-author Jacqueline Nolis. The book explores a lot of great, practical strategies for moving data science careers forward, many of which we discussed during our conversation.

Mar 30, 2020 • 46min
27. Alayna Kennedy - AI safety, AI ethics and the AGI debate
Most of us believe that decisions that affect us should be made rationally: they should be reached by following a reasoning process that combines data we trust with a logic that we find acceptable.
As long as human beings are making these decisions, we can probe at that reasoning to find out whether we agree with it. We can ask why we were denied that bank loan, or why a judge handed down a particular sentence, for example.
Today however, machine learning is automating away more and more of these important decisions, and as a result, our lives are increasingly governed by decision-making processes that we can’t interrogate or understand. Worse, machine learning algorithms can exhibit bias or make serious mistakes, so a black-box-ocracy risks becoming more like a dystopia than even the most imperfect human-designed systems we have today.
That’s why AI ethics and AI safety have drawn so much attention in recent years, and why I was so excited to talk to Alayna Kennedy, a data scientist at IBM whose work is focused on the ethics of machine learning, and the risks associated with ML-based decision-making. Alayna has consulted with key players in the US government’s AI effort, and has expertise applying machine learning in industry as well, through previous work on neural network modelling and fraud detection.

Mar 20, 2020 • 43min
26. Jeremy Howard - Coronavirus: the data behind the disease
In mid-January, China launched an official investigation into a string of unusual pneumonia cases in Hubei province. Within two months, that cluster of cases would snowball into a full-blown pandemic, with hundreds of thousands — perhaps even millions — of infections worldwide, with the potential to unleash a wave of economic damage not seen since the 1918 Spanish influenza or the Great Depression.
The exponential growth that led us from a few isolated infections to where we are today is profoundly counterintuitive. And it poses many challenges for the epidemiologists who need to pin down the transmission characteristics of the coronavirus, and for the policy makers who must act on their recommendations, and convince a generally complacent public to implement life-saving social distancing measures.
With the coronas in full bloom, I thought now would be a great time to reach out to Jeremy Howard, co-founder of the incredibly popular Fast.ai machine learning education site. Along with his co-founder Rachel Thomas, Jeremy authored a now-viral report outlining a data-driven case for concern regarding the coronavirus.

Mar 18, 2020 • 42min
25. Chris Parmer - Plotly founder on what data science is, and where it's going
It’s easy to think of data scientists as “people who explore and model data”. Bur in reality, the job description is much more flexible: your job as a data scientist is to solve problems that people actually have with data.
You’ll notice that I wrote “problems that people actually have” rather than “build models”. It’s relatively rare that the problems people have actually need to be solved using a predictive model. Instead, a good visualization or interactive chart is almost always the first step of the problem-solving process, and can often be the last as well.
And you know who understands visualization strategy really, really well? Plotly, that’s who. Plotly is a company that builds a ton of great open-source visualization, exploration and data infrastructure tools (and some proprietary commercial ones, too). Today, their tooling is being used by over 50 million people worldwide, and they’ve developed a number of tools and libraries that are now industry standard. So you can imagine how excited I was to speak with Plotly co-founder and Chief Product Officer Chris Parmer.
Chris had some great insights to share about data science and analytics tooling, including the future direction he sees the space moving in. But as his job title suggests, he’s also focused on another key characteristic that all great data scientists develop early on: product instinct (AKA: “knowing what to build next”).

Mar 10, 2020 • 40min
24. Xander Steenbrugge - Machine learning as a creative tool, and the quest for artificial general intelligence
Most machine learning models are used in roughly the same way: they take a complex, high-dimensional input (like a data table, an image, or a body of text) and return something very simple (a classification or regression output, or a set of cluster centroids). That makes machine learning ideal for automating repetitive tasks that might historically have been carried out only by humans.
But this strategy may not be the most exciting application of machine learning in the future: increasingly, researchers and even industry players are experimenting with generative models, that produce much more complex outputs like images and text from scratch. These models are effectively carrying out a creative process — and mastering that process hugely widens the scope of what can be accomplished by machines.
My guest today is Xander Steenbrugge, and his focus is on the creative side of machine learning. In addition to consulting with large companies to help them put state-of-the-art machine learning models into production, he’s focused a lot of his work on more philosophical and interdisciplinary questions — including the interaction between art and machine learning. For that reason, our conversation went in an unusually philosophical direction, covering everything from the structure of language, to what makes natural language comprehension more challenging than computer vision, to the emergence of artificial general intelligence, and how all these things connect to the current state of the art in machine learning.

Mar 3, 2020 • 46min
23. Iain Harlow - Leaving academia for industry and optimizing how you learn
I can’t remember how many times I’ve forgotten something important.
I’m sure it’s a regular occurrence though: I constantly forget valuable life lessons, technical concepts and useful bits of statistical theory. What’s worse, I often forget these things after working bloody hard to learn them, so my forgetfulness is just a giant waste of time and energy.
That’s why I jumped at the chance to chat with Iain Harlow, VP of Science at Cerego — a company that helps businesses build training courses for their employees by optimizing the way information is served to maximize retention and learning outcomes.
Iain knows a lot about learning and has some great insights to share about how you can optimize your own learning, but he’s also got a lot of expertise solving data science problems and hiring data scientists — two things that he focuses on in his work at Cerego. He’s also a veteran of the academic world, and has some interesting observations to share about the difference between research in academia and research in industry.

Feb 23, 2020 • 41min
22. Luke Marsden - Data Science Infrastructure and MLOps
You train your model. You check its performance with a validation set. You tweak its hyperparameters, engineer some features and repeat. Finally, you try it out on a test set, and it works great!
Problem solved? Well, probably not.
Five years ago, your job as a data scientist might have ended here, but increasingly, the data science life cycle is expanding to include the steps after basic testing. This shouldn’t come as a surprise: now that machine learning models are being used for life-or-death and mission-critical applications, there’s growing pressure on data scientists and machine learning engineers to ensure that effects like feature drift are addressed reliably, that data science experiments are replicable, and that data infrastructure is reliable.
This episode’s guest is Luke Marsden, and he’s made these problems the focus of this work. Luke is the founder and CEO of Dotscience, a data infrastructure startup that’s creating a git-like tool for data science version control. Luke has spent most of his professional life working on infrastructure problems at scale, and has a lot to say about the direction data science and MLOps are heading in.

Feb 16, 2020 • 44min
21. Adam Waksman - Data science is becoming software engineering
When I think of the trends I’ve seen in data science over the last few years, perhaps the most significant and hardest to ignore has been the increased focus on deployment and productionization of models. Not all companies need models deployed to production, of course but at those that do, there’s increasing pressure on data science teams to deliver software engineering along with machine learning solutions.
That’s why I wanted to sit down with Adam Waksman, Head of Core Technology at Foursquare. Foursquare is a company built on data and machine learning: they were one of the first fully scaled social media-powered recommendation services that gained real traction, and now help over 50 million people find restaurants and services in countries around the world.
Our conversation covered a lot of ground, from the interaction between software engineering and data science, to what he looks for in new hires, to the future of the field as a whole.