
Towards Data Science
Note: The TDS podcast's current run has ended.
Researchers and business leaders at the forefront of the field unpack the most pressing questions around data science and AI.
Latest episodes

Sep 9, 2020 • 56min
50. Ken Jee - Building your brand in data science
It’s no secret that data science is an area where brand matters a lot.
In fact, if there’s one thing I’ve learned from A/B testing ways to help job-seekers get hired at SharpestMinds, it’s that blogging, having a good presence on social media, making open-source contributions, podcasting and speaking at meetups is one of the best ways to get noticed by employers.
Brand matters. And if there’s one person who has a deep understanding of the value of brand in data science — and how to build one — it’s data scientist and YouTuber Ken Jee. Ken not only has experience as a data scientist and sports analyst, having worked at DraftKings and GE, but he’s also founded a number of companies — and his YouTube channel, with over 60 000 subscribers, is one of his main projects today.
For today’s episode, I spoke to Ken about brand-building strategies in data science, as well as job search tips for anyone looking to land their first data-related role.

Sep 2, 2020 • 57min
49. Catherine Zhou - The data science of learning
If you’re interested in upping your coding game, or your data science game in general, then it’s worth taking some time to understand the process of learning itself.
And if there’s one company that’s studied the learning process more than almost anyone else, it’s Codecademy. With over 65 million users, Codecademy has developed a deep understanding of what it takes to get people to learn how to code, which is why I wanted to speak to their Head of Data Science, Cat Zhou, for this episode of the podcast.

Aug 26, 2020 • 46min
48. Emmanuel Ameisen - Beyond the jupyter notebook: how to build data science products
Data science is about much more than jupyter notebooks, because data science problems are about more than machine learning.
What data should I collect? How good does my model need to be to be “good enough” to solve my problem? What form should my project take for it to be useful? Should it be a dashboard, a live app, or something else entirely? How do I deploy it? How do I make sure something awful and unexpected doesn’t happen when it’s deployed in production?
None of these questions can be answered by importing sklearn and pandas and hacking away in a jupyter notebook. Data science problems take a unique combination of business savvy and software engineering know-how, and that’s why Emmanuel Ameisen wrote a book called Building Machine Learning Powered Applications: Going from Idea to Product. Emmanuel is a machine learning engineer at Stripe, and formerly worked as Head of AI at Insight Data Science, where he oversaw the development of dozens of machine learning products.
Our conversation was focused on the missing links in most online data science education: business instinct, data exploration, model evaluation and deployment.

Aug 19, 2020 • 51min
47. Goku Mohandas - Industry research and how to show off your projects
Project-building is the single most important activity that you can get up to if you’re trying to keep your machine learning skills sharp or break into data science. But a project won’t do you much good unless you can show it off effectively and get feedback to iterate on it — and until recently, there weren’t many places you could turn to to do that.
A recent open-source initiative called MadeWithML is trying to change that, by creating an easily shareable repository of crowdsourced data science and machine learning projects, and its founder, former Apple ML researcher and startup founder Goku Mohandas, sat down with me for this episode of the TDS podcast to discuss data science projects, his experiences doing research in industry, and the MadeWithML project.

Aug 12, 2020 • 39min
46. Ihab Ilyas - Data cleaning is finally being automated
It’s cliché to say that data cleaning accounts for 80% of a data scientist’s job, but it’s directionally true.
That’s too bad, because fun things like data exploration, visualization and modelling are the reason most people get into data science. So it’s a good thing that there’s a major push underway in industry to automate data cleaning as much as possible.
One of the leaders of that effort is Ihab Ilyas, a professor at the University of Waterloo and founder of two companies, Tamr and Inductiv, both of which are focused on the early stages of the data science lifecycle: data cleaning and data integration. Ihab knows an awful lot about data cleaning and data engineering, and has some really great insights to share about the future direction of the space — including what work is left for data scientists, once you automate away data cleaning.

Aug 5, 2020 • 51min
45. Kenny Ning - Is data science merging with data engineering?
There’s been a lot of talk about the future direction of data science, and for good reason. The space is finally coming into its own, and as the Wild West phase of the mid-2010s well and truly comes to an end, there’s keen interest among data professionals to stay ahead of the curve, and understand what their jobs are likely to look like 2, 5 and 10 years down the road.
And amid all the noise, one trend is clearly emerging, and has already materialized to a significant degree: as more and more of the data science lifecycle is automated or abstracted away, data professionals can afford to spend more time adding value to companies in more strategic ways. One way to do this is to invest your time deepening your subject matter expertise, and mastering the business side of the equation. Another is to double down on technical skills, and focus on owning more and more of the data stack —particularly including productionization and deployment stages.
My guest for today’s episode of the Towards Data Science podcast has been down both of these paths, first as a business-focused data scientist at Spotify, where he spent his time defining business metrics and evaluating products, and second as a data engineer at Better.com, where his focus has shifted towards productionization and engineering. During our chat, Kenny shared his insights about the relative merits of each approach, and the future of the field.

Jul 29, 2020 • 53min
44. Jakob Foerster - Multi-agent reinforcement learning and the future of AI
Reinforcement learning has gotten a lot of attention recently, thanks in large part to systems like AlphaGo and AlphaZero, which have highlighted its immense potential in dramatic ways. And while the RL systems we’ve developed have accomplished some impressive feats, they’ve done so in a fairly naive way. Specifically, they haven’t tended to confront multi-agent problems, which require collaboration and competition. But even when multi-agent problems have been tackled, they’ve been addressed using agents that just assume other agents are an uncontrollable part of the environment, rather than entities with rich internal structures that can be reasoned and communicated with.
That’s all finally changing, with new research into the field of multi-agent RL, led in part by OpenAI, Oxford and Google alum, and current FAIR research scientist Jakob Foerster. Jakob’s research is aimed specifically at understanding how reinforcement learning agents can learn to collaborate better and navigate complex environments that include other agents, whose behavior they try to model. In essence, Jakob is working on giving RL agents a theory of mind.

Jul 22, 2020 • 39min
43. Ian Scott - Data science at Deloitte
Data science can look very different from one company to the next, and it’s generally difficult to get a consistent opinion on the question of what a data scientist really is.
That’s why it’s so important to speak with data scientists who apply their craft at different organizations — from startups to enterprises. Getting exposure to the full spectrum of roles and responsibilities that data scientists are called on to execute is the only way to distill data science down to its essence.
That’s why I wanted to chat with Ian Scott, Chief Science Officer at Deloitte Omnia, Deloitte’s AI practice. Ian was doing data science as far back as the late 1980s, when he was applying statistical modeling to data from experimental high energy physics as par of his PhD work at Harvard. Since then, he’s occupied strategic roles at a number of companies, most recently including Deloitte, where he leads significant machine learning and data science projects.

Jul 15, 2020 • 56min
42. Will Grathwohl - Energy-based models and the future of generative algorithms
Machine learning in grad school and machine learning in industry are very different beasts. In industry, deployment and data collection become key, and the only thing that matters is whether you can deliver a product that real customers want, fast enough to meet internal deadlines. In grad school, there’s a different kind of pressure, focused on algorithm development and novelty. It’s often difficult to know which path you might be best suited for, but that’s why it can be so useful to speak with people who’ve done both — and bonus points if their academic research experience comes from one of the top universities in the world.
For today’s episode of the Towards Data Science podcast, I sat down with Will Grathwohl, a PhD student at the University of Toronto, student researcher at Google AI, and alum of MIT and OpenAI. Will has seen cutting edge machine learning research in industry and academic settings, and has some great insights to share about the differences between the two environments. He’s also recently published an article on the fascinating topic of energy models in which he and his co-authors propose a unique way of thinking about generative models that achieves state-of-the-art performance in computer vision tasks.

Jul 8, 2020 • 45min
41. Solmaz Shahalizadeh - Data science in high-growth companies
One of the themes that I’ve seen come up increasingly in the past few months is the critical importance of product thinking in data science. As new and aspiring data scientists deepen their technical skill sets and invest countless hours doing practice problems on leetcode, product thinking has emerged as a pretty serious blind spot for many applicants. That blind spot has become increasingly critical as new tools have emerged that abstract away a lot of what used to be the day-to-day gruntwork of data science, allowing data scientists more time to develop subject matter expertise and focus on the business value side of the product equation.
If there’s one company that’s made a name for itself for leading the way on product-centric thinking in data science, it’s Shopify. And if there’s one person at Shopify who’s spent the most time thinking about product-centered data science, it’s Shopify’s Head of Data Science and Engineering, Solmaz Shahalizadeh. Solmaz has had an impressive career arc, which included joining Shopify in its pre-IPO days, back in 2013, and seeing the Shopify data science team grow from a handful of people to a pivotal organization-wide effort that tens of thousands of merchants rely on to earn a living today.