Vanishing Gradients cover image

Vanishing Gradients

Latest episodes

undefined
Aug 14, 2023 • 1h 23min

Episode 19: Privacy and Security in Data Science and Machine Learning

Hugo chats with Katharine Jarmul, a Principal Data Scientist at Thoughtworks Germany, specializing in privacy and ethics in data workflows. They dive into the vital distinctions between data privacy and security, demystifying common misconceptions. Katharine highlights the impact of GDPR and CCPA, and explores advanced concepts like federated learning and differential privacy. They also tackle real-world issues like privacy attacks and the ethical responsibilities of data scientists, making a compelling case for prioritizing privacy in data practices.
undefined
5 snips
May 24, 2023 • 1h 13min

Episode 18: Research Data Science in Biotech

Eric Ma, a leader in the research team at Moderna Therapeutics, discusses the tools and techniques used for drug discovery, the importance of machine learning and Bayesian inference, and the cultural questions surrounding hiring and management in research data science in biotech. They also explore the tech stack used in their work, the skills and hiring considerations in biotech, the importance of data testing and standardizing Excel spreadsheets, and the current state and challenges of Bayesian inference.
undefined
Feb 17, 2023 • 1h 16min

Episode 17: End-to-End Data Science

Hugo speaks with Tanya Cashorali, a data scientist and consultant that helps businesses get the most out of data, about what end-to-end data science looks like across many industries, such as retail, defense, biotech, and sports, including scoping out projects, figuring out the correct questions to ask, how projects can change, delivering on the promise, the importance of rapid prototyping, what it means to put models in production, and how to measure success. And much more, all the while grounding their conversation in real-world examples from data science, business, and life. In a world where most organizations think they need AI and yet 10-15% of data science actually involves model building, it’s time to get real about how data science and machine learning actually deliver value! LINKS Tanya on Twitter Vanishing Gradients on YouTube Saving millions with a Shiny app | Data Science Hangout with Tanya Cashorali Our next livestream: Research Data Science in Biotech with Eric Ma
undefined
13 snips
Dec 14, 2022 • 1h 23min

Episode 16: Data Science and Decision Making Under Uncertainty

JD Long, agricultural economist and quant, discusses decision making under uncertainty in data science, common mistakes, heuristics for decision-making, and the impact of cognitive biases. Topics include coupling data science with decision-making, model building, storytelling, and the intersection of cognitive biases.
undefined
4 snips
Dec 7, 2022 • 54min

Episode 15: Uncertainty, Risk, and Simulation in Data Science

Hugo speaks with JD Long, agricultural economist, quant, and stochastic modeler, about decision making under uncertainty and how we can use our knowledge of risk, uncertainty, probabilistic thinking, causal inference, and more to help us use data science and machine learning to make better decisions in an uncertain world. This is part 1 of a two part conversation. In this, part 1, we discuss risk, uncertainty, probabilistic thinking, and simulation, all with a view towards improving decision making and we draw on examples from our personal lives, the pandemic, our jobs, the reinsurance space, and the corporate world. In part 2, we’ll get into the nitty gritty of decision making under uncertainty. As JD says, and I paraphrase, “You may think you train your models, but your models are really training you.” Links Vanishing Gradients' new YouTube channel! JD on twitter Executive Data Science, episode 5 of Vanishing Gradients, in which Jim Savage and Hugo talk through decision making and why you should always be integrating your loss function over your posterior Fooled by Randomness by Nassim Taleb Superforecasting: The Art and Science of Prediction Philip E. Tetlock and Dan Gardner Thinking in Bets by Annie Duke The Signal and the Noise: Why So Many Predictions Fail by Nate Silver Thinking, Fast and Slow by Daniel Kahneman
undefined
10 snips
Nov 20, 2022 • 1h 9min

Episode 14: Decision Science, MLOps, and Machine Learning Everywhere

Hugo Bowne-Anderson discusses decision science, MLOps, and the ubiquity of machine learning models. Topics include decision-making under uncertainty, biases in data collection, MLOps and DevOps convergence, digital feedback loops, Google's search evolution, and the impact of modern algorithms on reality perception.
undefined
Oct 11, 2022 • 1h 23min

Episode 13: The Data Science Skills Gap, Economics, and Public Health

Hugo speak with Norma Padron about data science education and continuous learning for people working in healthcare, broadly construed, along with how we can think about the democratization of data science skills more generally. Norma is CEO of EmpiricaLab, where her team‘s mission is to bridge work and training and empower healthcare teams to focus on what they care about the most: patient care. In a word, EmpiricaLab is a platform focused on peer learning and last-mile training for healthcare teams. As you’ll discover, Norma’s background is fascinating: with a Ph.D. in health policy and management from Yale University, a master's degree in economics from Duke University (among other things), and then working with multiple early stage digital health companies to accelerate their growth and scale, this is a wide ranging conversation about how and where learning actually occurs, particularly with respect to data science; we talk about how the worlds of economics and econometrics, including causal inference, can be used to make data science and more robust and less fragile field, and why these disciplines are essential to both public and health policy. It was really invigorating to talk about the data skills gaps that exists in organizations and how Norma’s team at Empiricalab is thinking about solving it in the health space using a 3 tiered solution of content creation, a social layer, and an information discovery platform. All of this in service of a key question we’re facing in this field: how do you get the right data skills, tools, and workflows, in the hands of the people who need them, when the space is evolving so quickly? Links Norma's website EmpiricaLab Norma on twitter
undefined
5 snips
Sep 30, 2022 • 1h 33min

Episode 12: Data Science for Social Media: Twitter and Reddit

Hugo speakswith Katie Bauer about her time working in data science at both Twitter and Reddit. At the time of recording, Katie was a data science manager at Twitter and prior to that, a founding member of the data team at Reddit. She’s now Head of Data Science at Gloss Genius so congrats on the new job, Katie! In this conversation, we dive into what type of challenges social media companies face that data science is equipped to solve: in doing so, we traverse the difference and similarities in companies such as Twitter and Reddit, the major differences in being an early member of a data team and joining an established data function at a larger organization, the supreme importance of robust measurement and telemetry in data science, along with the mixed incentives for career data scientists, such as building flashy new things instead of maintaining existing infrastructure. I’ve always found conversations with Katie to be a treasure trove of insights into data science and machine learning practice, along with key learnings about data science management. In a word, Katie helps me to understand our space better. In this conversation, she told me that one important function data science can serve in any organization is creating a shared context for lots of different people in the org. We dive deep into what this actually means, how it can play out, traversing the world of dashboards, metric stores, feature stores, machine learning products, the need for top-down support, and much, much more.
undefined
Sep 16, 2022 • 1h 46min

Episode 11: Data Science: The Great Stagnation

Hugo speaks with Mark Saroufim, an Applied AI Engineer at Meta who works on PyTorch where his team’s main focus is making it as easy as possible for people to deploy PyTorch in production outside Meta. Mark first came on our radar with an essay he wrote called Machine Learning: the Great Stagnation, which was concerned with the stagnation in machine learning in academic research and in which he stated Machine learning researchers can now engage in risk-free, high-income, high-prestige work. They are today’s Medieval Catholic priests. This is just the tip of the icebergs of Mark’s critical and often sociological eye and one of the reasons I was excited to speak with him. In this conversation, we talk about the importance of open source software in modern data science and machine learning and how Mark thinks about making it as easy to use as possible. We also talk about risk assessments in considering whether to adopt open source or not, the supreme importance of good documentation, and what we can learn from the world of video game development when thinking about open source. We then dive into the rise of the machine learning cult leader persona, in the context of examples such as Hugging Face and the community they’ve built. We discuss the role of marketing in open source tooling, along with for profit data science and ML tooling, how it can impact you as an end user, and how much of data science can be considered differing forms of live action role playing and simulation. We also talk about developer marketing and content for data professionals and how we see some of the largest names in ML researchers being those that have gigantic Twitter followers, such as Andrei Karpathy. This is part of a broader trend in society about the skills that are required to capture significant mind share these days. If that’s not enough, we jump into how machine learning ideally allows businesses to build sustainable and defensible moats, by which we mean the ability to maintain competitive advantages over competitors to retain market share. In between this interview and its release, PyTorch joined the Linux Foundation, which is something we’ll need to get Mark back to discuss sometime. Links The Myth of Objective Tech Screens Machine Learning: The Great Stagnation Fear the Boom and Bust: Keynes vs. Hayek - The Original Economics Rap Battle! History and the Security of Property by Nick Szabo Mark on YouTube Mark's Substack Mark's Discord
undefined
6 snips
Aug 18, 2022 • 1h 27min

Episode 10: Investing in Machine Learning

Hugo speaks with Sarah Catanzaro, General Partner at Amplify Partners, about investing in data science and machine learning tooling and where we see progress happening in the space. Sarah invests in the tools that we both wish we had earlier in our careers: tools that enable data scientists and machine learners to collect, store, manage, analyze, and model data more effectively. As you’ll discover, Sarah identifies as a scientist first and an investor second and still believes that her mission is to enable companies to become data-driven and to generate ROI through machine and statistical learning. In her words, she’s still that cuckoo kid who’s ranting and raving about how data and AI will shift every tide. In this conversation, we talk about what scientific inquiry actually is and the elements of playfulness and seriousness it necessarily involves, and how it can be used to generate business value. We talk about Sarah’s unorthodox path from a data scientist working in defense to her time at Palantir and how that led her to build out a data team and function for a venture capital firm and then to becoming a VC in the data tooling space. We then really dive into the data science and machine learning tooling space to figure out why it’s so fragmented: we look to the data analytics stack and software engineering communities to find historical tethers that may be useful. We discuss the moving parts that led to the establishment of a standard, a system of record, and clearly defined roles in analytics and what we can learn from that for machine learning! We also dive into the development of tools, workflows, and division of labour as partial exercises in pattern recognition and how this can be at odds with the variance we see in the machine learning landscape, more generally! Two take-aways are that we need best practices and we need more standardization. We also discussed that, with all our focus and conversations on tools, what conversation we’re missing and Sarah was adamant that we need to be focusing on questions, not solutions, and even questioning what ML is useful for and what it isn’t, diving into a bunch of thoughtful and nuanced examples. I’m also grateful that Sarah let me take her down a slightly dangerous and self-critical path where we riffed on both our roles in potentially contributing to the tragedy of commons we’re all experiencing in the data tooling landscape, me working in tool building, developer relations, and in marketing, and Sarah in venture capital.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode