
Vanishing Gradients
A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson.
It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts. And if you've been around for a while, you'll find out what's happening in many other parts of the data world.
Latest episodes

31 snips
May 31, 2024 • 1h 32min
Episode 27: How to Build Terrible AI Systems
Jason Liu, an independent consultant in recommendation systems, discusses building AI apps, playbook for ML, and avoiding pitfalls. They focus on building terrible AI systems to learn how to prevent failures. The podcast explores consulting in various industries, future tooling, and creating robust AI systems.

May 15, 2024 • 1h 52min
Episode 26: Developing and Training LLMs From Scratch
Sebastian Raschka discusses developing and training large language models (LLMs) from scratch, covering topics like prompt engineering, fine-tuning, and RAG systems. They explore the skills, resources, and hardware needed, the lifecycle of LLMs, live coding to create a spam classifier, and the importance of hands-on experience. They also touch on using PyTorch Lightning and fabric for managing large models, and reveal insights on techniques in natural language processing models and evaluating LLMs for classification problems.

10 snips
Mar 18, 2024 • 1h 21min
Episode 25: Fully Reproducible ML & AI Workflows
Omoju Miller, a machine learning expert and CEO of Fimio, shares her vision for transparent and reproducible ML workflows. She discusses the necessity of open tools and data in combating the monopolization of tech by closed-source APIs. Topics include the evolution of developer tools, the importance of data provenance, and the potential of a collaborative open compute ecosystem. Omoju also emphasizes user accessibility in machine learning and envisions a future where everyone can build production-ready applications with ease.

4 snips
Feb 27, 2024 • 1h 30min
Episode 24: LLM and GenAI Accessibility
Hugo and Johno discuss the evolution of tooling and accessibility in AI over the past decade, highlighting advancements in using big models from Hugging Face and hi-res satellite data. They delve into the Generative AI mindset, democratizing deep learning with fast.ai, and the importance of UX in generative AI applications. The discussion also covers the skill set needed to be an LLM and AI guru, as well as efforts at answer.ai to democratize LLMs and foundation models.

11 snips
Dec 20, 2023 • 1h 21min
Episode 23: Statistical and Algorithmic Thinking in the AI Age
Allen Downey discusses statistical paradoxes and fallacies in using data, including the base rate fallacy and algorithmic fairness. They dive into examples like COVID vaccination data and explore the challenges of interpreting statistical information correctly. The conversation also covers topics such as epidemiological paradoxes, Gaussian distributions, and the importance of understanding biases in data interpretation for media consumption.

Nov 27, 2023 • 1h 20min
Episode 22: LLMs, OpenAI, and the Existential Crisis for Machine Learning Engineering
Jeremy Howard (Fast.ai), Shreya Shankar (UC Berkeley), and Hamel Husain (Parlance Labs) join Hugo Bowne-Anderson to talk about how LLMs and OpenAI are changing the worlds of data science, machine learning, and machine learning engineering.
Jeremy Howard is co-founder of fast.ai, an ex-Chief Scientist at Kaggle, and creator of the ULMFiT approach on which all modern language models are based. Shreya Shankar is at UC Berkeley, ex Google brain, Facebook, and Viaduct. Hamel Husain has his own generative AI and LLM consultancy Parlance Labs and was previously at Outerbounds, Github, and Airbnb.
They talk about
How LLMs shift the nature of the work we do in DS and ML,
How they change the tools we use,
The ways in which they could displace the role of traditional ML (e.g. will we stop using xgboost any time soon?),
How to navigate all the new tools and techniques,
The trade-offs between open and closed models,
Reactions to the recent Open Developer Day and the increasing existential crisis for ML.
LINKS
The panel on YouTube
Hugo and Jeremy's upcoming livestream on what the hell happened recently at OpenAI, among many other things
Vanishing Gradients on YouTube
Vanishing Gradients on twitter

15 snips
Nov 14, 2023 • 1h 8min
Episode 21: Deploying LLMs in Production: Lessons Learned
Guest Hamel Husain, a machine learning engineer, discusses the business value of large language models (LLMs) and generative AI. They cover common misconceptions, necessary skills, and techniques for working with LLMs. The podcast explores the challenges of working with ML software and chat GPT, the importance of data cleaning and analysis, and deploying LLMs in production with guardrails. They also discuss an AI-powered real estate CRM and optimizing marketing strategies through data analysis.

Oct 5, 2023 • 1h 27min
Episode 20: Data Science: Past, Present, and Future
Chris Wiggins, Chief data scientist for the New York Times, and Matthew Jones, professor of history at Princeton University, discuss their book on the history of data and its impact on society. They explore topics such as the use of data for decision making, the development of statistical techniques, the influence of Francis Galton on eugenics, and the rise of data, compute, and algorithms in various fields.

Aug 14, 2023 • 1h 23min
Episode 19: Privacy and Security in Data Science and Machine Learning
Hugo chats with Katharine Jarmul, a Principal Data Scientist at Thoughtworks Germany, specializing in privacy and ethics in data workflows. They dive into the vital distinctions between data privacy and security, demystifying common misconceptions. Katharine highlights the impact of GDPR and CCPA, and explores advanced concepts like federated learning and differential privacy. They also tackle real-world issues like privacy attacks and the ethical responsibilities of data scientists, making a compelling case for prioritizing privacy in data practices.

5 snips
May 24, 2023 • 1h 13min
Episode 18: Research Data Science in Biotech
Eric Ma, a leader in the research team at Moderna Therapeutics, discusses the tools and techniques used for drug discovery, the importance of machine learning and Bayesian inference, and the cultural questions surrounding hiring and management in research data science in biotech. They also explore the tech stack used in their work, the skills and hiring considerations in biotech, the importance of data testing and standardizing Excel spreadsheets, and the current state and challenges of Bayesian inference.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.