The MLOps Podcast

Dean Pleban @ DagsHub

A podcast from DagsHub about bringing machine learning into the real world. Each episode features a conversation with top data science and machine learning practitioners, who'll share their thoughts, best practices, and tips for promoting machine learning to production

Episodes

Mentioned books

Nov 4, 2021 • 1h 9min

🎓 MLOps lessons learned helping companies build their ML systems with Lee Harper, Lead DS at Catapult

In this episode, I'm speaking with Lee Harper, Principal Data Scientist at Catapult Systems. Lee holds a Ph.D. in Physical and Theoretical Chemistry. Lee is a teacher-turned-data scientist. We cover the various entry paths into the world of data science, the value of background diversity, security in ML production, and even AI fairness. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Podcast intro 01:00 Guest introduction 01:39 How did you get into the fields of data science and machine learning? 05:04 Coding boot camps vs. academia & diversity of backgrounds in ML 09:37 How does the process of bringing your work into production change over the years? 13:02 How has the change in the languages used for data science affected production processes? 16:01 How do you accelerate the timeframes for getting from POC to production in ML? 18:19 Do data scientists reinvent the wheel more often than software developers, and why? 22:14 The value of learning how to Google 23:00 Recurring themes, challenges, and common issues in data science 27:50 Solving for security in ML in production 31:57 ML security considerations for startups 34:30 Data security considerations in ML 35:18 What is the most interesting topic in machine learning right now? 38:05 ML fairness, bias, and responsible AI 41:44 What does it mean to build a fair or unbiased model? 47:15 If you had to choose one challenge in bringing models to production, what would it be? 51:00 What are the tools and processes that you use to make the transition to production easier? 55:35 About "vendor lock-in" 58:00 Your favorite tool recommendations 1:03:35 Recommendations for the audience --- Relevant Links: Linux Command Line and Shell Scripting Bible – https://www.amazon.com/Linux-Command-Shell-Scripting-Bible/dp/1119700914 Project Hail Mary – https://www.amazon.com/Project-Hail-Mary-Andy-Weir/dp/0593135202 Social Links: https://www.linkedin.com/company/dagshub/ https://www.linkedin.com/company/catapult-systems/ https://www.linkedin.com/in/leeharper2425/ https://twitter.com/DeanPlbn https://twitter.com/TheRealDAGsHub

Sep 20, 2021 • 1h 14min

🧠 Algorithmic challenges in bringing ML models into production with Roey Mechrez, CTO at BeyondMinds

In this episode, I'm speaking with Roey Mechrez from BeyondMinds. Roey holds a Ph.D. in Electrical Engineering, with vast experience in computer vision and deep learning research. We discuss the challenges of gluing together infrastructure solutions for an end-to-end ML platform, as well as generating monitoring insights for non-technical stakeholders and combating catastrophic forgetting. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Podcast intro 01:00 Guest intro 01:49 What does BeyondMinds do? 06:24 Audience for an end-to-end ML platform 12:14 Communicating with non-technical stakeholders/users 15:03 The future of "AI-powered tools", and human-machine collaboration 20:04 On complex system orchestration, generating insights from monitoring, and catastrophic forgetting – Biggest challenges in production ML 25:23 Why is catastrophic forgetting a hard problem and how do you deal with it? 30:02 "Secret" tips on how to get started with automating the retraining process 33:30 Generating monitoring insights and observations in a user-friendly format 38:12 Making data labeling issues explainable (automatically) 45:07 Customizing complex systems per user – Orchestrating an ML platform 52:58 API design in ML platform components 55:45 Measuring success for researchers, ML engineers, and software developers – can ML work fit into the Agile workflow. 1:02:22 Is "time to production" a good metric? Gains in time to production in the real world 1:06:02 How do you divide the work between ML researchers and engineers? 1:08:39 Recommendations for the audience --- Relevant Links: A16z blog about AI Data Science work in an agile environment – A talk by Dima Goldenberg Hayot Kis (Hebrew Podcast) חיות כיס Data Engineering Podcast ACX Podcast Social Links: https://www.linkedin.com/company/beyondminds/ https://www.linkedin.com/company/dagshub/ https://twitter.com/roeyme https://twitter.com/DeanPlbn https://twitter.com/TheRealDAGsHub

Aug 11, 2021 • 46min

🐤 Feature stores and CI/CD for machine learning with Qwak.ai VP Engineering, Ran Romano

Ran Romano, VP Engineering at Qwak.ai, discusses the evolution of job titles in machine learning, challenges of using Jupyter notebooks in production, and the importance of adopting a CI/CD approach. They also talk about the challenges in scaling ML models to production, ensuring data reproducibility, and using open source solutions in their ML platform.

Jul 4, 2021 • 1h 19min

🤗 Large ML models in production with HuggingFace CTO Julien Chaumond

In this episode, I'm speaking with Julien Chaumond from 🤗 HuggingFace, about how they got started, getting large language models to production in millisecond inference times, and the CERN for machine learning. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 01:00 - Guest intro 02:14 - Origin of HuggingFace 05:37 - Why the focus on NLP? 07:45 - The success of the HuggingFace community 13:14 - Reproducing models and scaling for the community 18:14 - Enabling large models in production 23:14 - How HuggingFace scales so many models 27:34 - The biggest challenge HuggingFace solved in MLOps 32:02 - How HuggingFace transitions from research to production 34:44 - Using notebooks vs python modules 38:27 - The most interesting topic in ML production 40:10 - Fascinating ML research 45:24 - Learning new things 51:14 - Something that is true but most people disagree with 56:54 - Tips to organize research teams 1:00:05 - New features for accelerated inference 1:01:35 - Most common use case of HuggingFace 1:04:17 - Integrating search algorithms into transformer library 1:05:09 - Integrating vision models 1:06:06 - Long term business model 1:10:55 - Automation and simplification of the process of building models 1:13:02 - Support for real-time inference 1:14:40 - Recommendations for the audience --- Relevant Links: FastDS: https://github.com/DAGsHub/fds BigScience: https://bigscience.huggingface.co https://www.linkedin.com/company/dagshub/ https://www.linkedin.com/company/huggingface/ https://twitter.com/TheRealDAGsHub https://twitter.com/huggingface

Apr 27, 2021 • 1h 1min

🛣 Finding your path in ML with NLP Engineer Urszula Czerwinska

In this episode, I'm speaking with Urszula Czerwinska about her path as a data scientist, the projects she worked on, experiences gained as a data scientist, as well as the challenges she's overcome in bringing her machine learning (ML) into production. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 0:00 - Podcast intro 1:15 - Guest intro and how you got into data science 3:48 - Finding your fit – research or industry and when to transition 7:23 - What types of ML projects do you specialize in 10:41 - ML explainability and interpretability 15:26 - ML explainability with non-technical stakeholders 17:13 - What problems does your team solve within the organization 20:56 - ML in production – how to bring your ML projects from research to production 25:17 - The tools you can't live without 28:11 - Do you have a set process for productizing ML projects 30:08 - Team structures and communication for data science teams 33:42 - Who's in charge of setting up infrastructure for a project and job title discussion 36:29 - Interesting tools and repositories you work with 39:30 - How do you stay up to date 42:00 - Biggest challenges for you in ML 45:12 - Favorite and least favorite thing about being a data scientist 49:52 - Handling a workplace that doesn't understand what a data scientist is 53:07 - Data scientists are 🦄 53:30 Good papers you read recently 58:12 - Tips to improve the data science workflow Relevant Links: - flair: https://github.com/flairNLP/flair - AllenNLP: https://github.com/allenai/allennlp - Papers with Code: https://paperswithcode.com/ - Dair.ai newsletter: https://dair.ai/newsletter/ - HuggingFace: https://huggingface.co/blog

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app