

Vanishing Gradients
Hugo Bowne-Anderson
A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson.
It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts. And if you've been around for a while, you'll find out what's happening in many other parts of the data world.
It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts. And if you've been around for a while, you'll find out what's happening in many other parts of the data world.
Episodes
Mentioned books

4 snips
May 1, 2022 • 1h 3min
Episode 7: The Evolution of Python for Data Science
Hugo speaks with Peter Wang, CEO of Anaconda, about how Python became so big in data science, machine learning, and AI. They jump into many of the technical and sociological beginnings of Python being used for data science, a history of PyData, the conda distribution, and NUMFOCUS.
They also talk about the emergence of online collaborative environments, particularly with respect to open source, and attempt to figure out the movings parts of PyData and why it has had the impact it has, including the fact that many core developers were not computer scientists or software engineers, but rather scientists and researchers building tools that they needed on an as-needed basis
They also discuss the challenges in getting adoption for Python and the things that the PyData stack solves, those that it doesn’t and what progress is being made there.
People who have listened to Hugo podcast for some time may have recognized that he's interested in the sociology of the data science space and he really considered speaking with Peter a fascinating opportunity to delve into how the Pythonic data science space evolved, particularly with respect to tooling, not only because Peter had a front row seat for much of it, but that he was one of several key actors at various different points. On top of this, Hugo wanted to allow Peter’s inner sociologist room to breathe and evolve in this conversation.
What happens then is slightly experimental – Peter is a deep, broad, and occasionally hallucinatory thinker and Hugo wanted to explore new spaces with him so we hope you enjoy the experiments they play as they begin to discuss open-source software in the broader context of finite and infinite games and how OSS is a paradigm of humanity’s ability to create generative, nourishing and anti-rivlarous systems where, by anti-rivalrous, we mean things that become more valuable for everyone the more people use them! But we need to be mindful of finite-game dynamics (for example, those driven by corporate incentives) co-opting and parasitizing the generative systems that we build.
These are all considerations they delve far deeper into in Part 2 of this interview, which will be the next episode of VG, where we also dive into the relationship between OSS, tools, and venture capital, amonh many others things.
LInks
Peter on twitter
Anaconda Nucleus
Calling out SciPy on diversity (even though it hurts) by Juan Nunez-Iglesias
Here Comes Everybody: The Power of Organizing Without Organizations by Clay Shirky
Finite and Infinite Games by James Carse
Governing the Commons: The Evolution of Institutions for Collective Action by Elinor Olstrom
Elinor Ostrom's 8 Principles for Managing A Commmons

Apr 4, 2022 • 1h 27min
Episode 6: Bullshit Jobs in Data Science (and what to do about them)
Hugo speaks with Jacqueline Nolis, Chief Product Officer at Saturn Cloud (formerly Head of Data Science), about all types of failure modes in data science, ML, and AI, and they delve into bullshit jobs in data science (yes, that’s a technical term, as you’ll find out) –they discuss the elements that are bullshit, the elements that aren’t, and how to increase the ratio of the latter to the former.
They also talk about her journey in moving from mainly working in prescriptive analytics building reports in PDFs and power points to deploying machine learning products in production. They delve into her motion from doing data science to designing products for data scientists and how to think about choosing career paths. Jacqueline has been an individual contributor, a team lead, and a principal data scientist so has a lot of valuable experience here. They talk about her experience of transitioning gender while working in data science and they work hard to find a bright vision for the future of this industry!
Links
Jacqueline on twitter
Building a Career in Data Science by Jacqueline and Emily Robinson
Saturn Cloud
Why are we so surprised?, a post by Allen Downey on communicating and thinking through uncertainty
Data Mishaps Night!
The Trump administration’s “cubic model” of coronavirus deaths, explained by Matthew Yglesias
Working Class Deep Learner by Mark Saroufim

Mar 23, 2022 • 1h 48min
Episode 5: Executive Data Science
Hugo speaks with Jim Savage, the Director of Data Science at Schmidt Futures, about the need for data science in executive training and decision, what data scientists can learn from economists, the perils of "data for good", and why you should always be integrating your loss function over your posterior.
Jim and Hugo talk about what data science is and isn’t capable of, what can actually deliver value, and what people really enjoy doing: the intersection in this Venn diagram is where we need to focus energy and it may not be quite what you think it is!
They then dive into Jim's thoughts on what he dubs Executive Data Science. You may be aware of the slicing of the data science and machine learning spaces into descriptive analytics, predictive analytics, and prescriptive analytics but, being the thought surgeon that he is, Jim proposes a different slicing into
(1) tool building OR data science as a product,
(2) tools to automate and augment parts of us, and
(3) what Jim calls Executive Data Science.
Jim and Hugo also talk about decision theory, the woeful state of causal inference techniques in contemporary data science, and what techniques it would behoove us all to import from econometrics and economics, more generally. If that’s not enough, they talk about the importance of thinking through the data generating process and things that can go wrong if you don’t. In terms of allowing your data work to inform your decision making, thery also discuss Jim’s maxim “ALWAYS BE INTEGRATING YOUR LOSS FUNCTION OVER YOUR POSTERIOR”
Last but definitively not least, as Jim has worked in the data for good space for much of his career, they talk about what this actually means, with particular reference to fast.ai founder & QUT professor of practice Rachel Thomas’ blog post called “Doing Data Science for Social Good, Responsibly”. Rachel’s post takes as its starting point the following words of Sarah Hooker, a researcher at Google Brain:
"Data for good" is an imprecise term that says little about who we serve, the tools used, or the goals. Being more precise can help us be more accountable & have a greater positive impact.
And Jim and I discuss his work in the light of these foundational considerations.
Links
Jim on twitter
What Is Causal Inference?An Introduction for Data Scientists by Hugo Bowne-Anderson and Mike Loukides
Jim's must-watch Data Council talk on Productizing Structural Models
[Mastering Metrics}(https://www.masteringmetrics.com/) by Angrist and Pischke
Mostly Harmless Econometrics: An Empiricist's Companion by Angrist and Pischke
The Book of Why by Judea Pearl
Decision-Making in a Time of Crisis by Hugo Bowne-Anderson
Doing Data Science for Social Good, Responsibly by Rachel Thomas

Mar 9, 2022 • 1h 44min
Episode 4: Machine Learning at T-Mobile
Hugo speaks with Heather Nolis, Principal Machine Learning engineer at T-mobile, about what data science, machine learning, and AI look like at T-mobile, along with Heather’s path from a software development intern there to principal ML engineer running a team of 15.
They talk about: how to build a DS culture from scratch and what executive-level support looks like, as well as how to demonstrate machine learning value early on from a shark tank style pitch night to the initial investment through to the POC and building out the function; all the great work they do with R and the Tidyverse in production; what it’s like to be a lesbian in tech, and about what it was like to discover she was autistic and how that impacted her work; how to measure and demonstrate success and ROI for the org; some massive data science fails!; how to deal with execs wanting you to use the latest GPT-X – in a fragmented tooling landscape; how to use the simplest technology to deliver the most value.
Finally, the team just hired their first FT ethicist and they speak about how ethics can be embedded in a team and across an institution.
Links
Put R in prod: Tools and guides to put R models into production
Enterprise Web Services with Neural Networks Using R and TensorFlow
Heather on twitter
T-Mobile is hiring!
Hugo's upcoming fireside chat and AMA with Hilary Parker about how to actually produce sustainable business value using machine learning and product management for ML!

Mar 1, 2022 • 1h 33min
Episode 3: Language Tech For All
Rachael Tatman is a senior developer advocate for Rasa, where she’s helping developers build and deploy ML chatbots using their open source framework.
Rachael has a PhD in Linguistics from the University of Washington where her research was on computational sociolinguistics, or how our social identity affects the way we use language in computational contexts. Previously she was a data scientist at Kaggle and she’s still a Kaggle Grandmaster.
In this conversation, Rachael and I talk about the history of NLP and conversational AI//chatbots and we dive into the fascinating tension between rule-based techniques and ML and deep learning – we also talk about how to incorporate machine and human intelligence together by thinking through questions such as “should a response to a human ever be automated?” Spoiler alert: the answer is a resounding NO WAY!
In this journey, something that becomes apparent is that many of the trends, concepts, questions, and answers, although framed for NLP and chatbots, are applicable to much of data science, more generally.
We also discuss the data scientist’s responsibility to end-users and stakeholders using, among other things, the lens of considering those whose data you’re working with to be data donors.
We then consider what globalized language technology looks like and can look like, what we can learn from the history of science here, particularly given that so much training data and models are in English when it accounts for so little of language spoken globally.
Links
Rachael's website
Rasa
Speech and Language Processing
by Dan Jurafsky and James H. Martin
Masakhane, putting African languages on the #NLP map since 2019
The Distributed AI Research Institute, a space for independent, community-rooted AI research, free from Big Tech’s pervasive influence
The Algorithmic Justice League, unmasking AI harms and biases
Black in AI, increasing the presence and inclusion of Black people in the field of AI by creating space for sharing ideas, fostering collaborations, mentorship and advocacy
Hugo's blog post on his new job and why it's exciting for him to double down on helping scientists do better science

7 snips
Feb 20, 2022 • 1h 46min
Episode 2: Making Data Science Uncool Again
Jeremy Howard is a data scientist, researcher, developer, educator, and entrepreneur. Jeremy is a founding researcher at fast.ai, a research institute dedicated to making deep learning more accessible. He is also a Distinguished Research Scientist at the University of San Francisco, the chair of WAMRI, and is Chief Scientist at platform.ai.
In this conversation, we’ll be talking about the history of data science, machine learning, and AI, where we’ve come from and where we’re going, how new techniques can be applied to real-world problems, whether it be deep learning to medicine or porting techniques from computer vision to NLP. We’ll also talk about what’s present and what’s missing in the ML skills revolution, what software engineering skills data scientists need to learn, how to cope in a space of such fragmented tooling, and paths for emerging out of the shadow of FAANG. If that’s not enough, we’ll jump into how spreading DS skills around the globe involves serious investments in education, building software, communities, and research, along with diving into the social challenges that the information age and the AI revolution (so to speak) bring with it.
But to get to all of this, you’ll need to listen to a few minutes of us chatting about chocolate biscuits in Australia!
Links
fast.ai · making neural nets uncool again
nbdev: create delightful python projects using Jupyter Notebooks
The fastai book, published as Jupyter Notebooks
Deep Learning for Coders with fastai and PyTorch
The wonderful and terrifying implications of computers that can learn -- Jeremy' awesome TED talk!
Manna by Marshall Brain
Ghost Work by Mary L. Gray and Siddharth Suri
Uberland by Alex Rosenblat

Feb 16, 2022 • 5min
Episode 1: Introducing Vanishing Gradients
In this brief introduction, Hugo introduces the rationale behind launching a new data science podcast and gets excited about his upcoming guests: Jeremy Howard, Rachael Tatman, and Heather Nolis!
Original music, bleeps, and blops by local Sydney legend PlaneFace!