
AI Engineering Podcast
This show is your guidebook to building scalable and maintainable AI systems. You will learn how to architect AI applications, apply AI to your work, and the considerations involved in building or customizing new models. Everything that you need to know to deliver real impact and value with machine learning and artificial intelligence.
Latest episodes

Nov 8, 2023 • 51min
Validating Machine Learning Systems For Safety Critical Applications With Ketryx
Erez Kaminski, an expert in validating machine learning systems for safety critical applications, discusses the regulatory burdens on ML teams in medical applications, the challenges of validating ML systems, and opportunities for automating overhead. He also shares insights into the excitement in the medical field for improving medical applications and highlights the benefits of using Ketryx for building medical software.

Oct 24, 2023 • 46min
Applying Declarative ML Techniques To Large Language Models For Better Results
SummaryLarge language models have gained a substantial amount of attention in the area of AI and machine learning. While they are impressive, there are many applications where they are not the best option. In this episode Piero Molino explains how declarative ML approaches allow you to make the best use of the available tools across use cases and data formats.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Your host is Tobias Macey and today I'm interviewing Piero Molino about the application of declarative ML in a world being dominated by large language modelsInterviewIntroductionHow did you get involved in machine learning?Can you start by summarizing your perspective on the effect that LLMs are having on the AI/ML industry? In a world where LLMs are being applied to a growing variety of use cases, what are the capabilities that they still lack?How does declarative ML help to address those shortcomings?The majority of current hype is about commercial models (e.g. GPT-4). Can you summarize the current state of the ecosystem for open source LLMs? For teams who are investing in ML/AI capabilities, what are the sources of platform risk for LLMs?What are the comparative benefits of using a declarative ML approach?What are the most interesting, innovative, or unexpected ways that you have seen LLMs used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on declarative ML in the age of LLMs?When is an LLM the wrong choice?What do you have planned for the future of declarative ML and Predibase?Contact InfoLinkedInWebsiteClosing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?LinksPredibasePodcast EpisodeLudwigPodcast.__init__ EpisodeRecommender SystemsInformation RetrievalVector DatabaseTransformer ModelBERTContext WindowsLLAMAThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Oct 15, 2023 • 1h 3min
Surveying The Landscape Of AI and ML From An Investor's Perspective
SummaryArtificial Intelligence is experiencing a renaissance in the wake of breakthrough natural language models. With new businesses sprouting up to address the various needs of ML and AI teams across the industry, it is a constant challenge to stay informed. Matt Turck has been compiling a report on the state of ML, AI, and Data for his work at FirstMark Capital. In this episode he shares his findings on the ML and AI landscape and the interesting trends that are developing.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES.Your host is Tobias Macey and today I'm interviewing Matt Turck about his work on the MAD (ML, AI, and Data) landscape and the insights he has gained on the ML ecosystemInterviewIntroductionHow did you get involved in machine learning?Can you describe what the MAD landscape project is and the story behind it?What are the major changes in the ML ecosystem that you have seen since you first started compiling the landscape? How have the developments in consumer-grade AI in recent years changed the business opportunities for ML/AI?What are the coarse divisions that you see as the boundaries that define the different categories for ML/AI in the landscape?For ML infrastructure products/companies, what are the biggest challenges that they face in engineering and customer acquisition?What are some of the challenges in building momentum for startups in AI (existing moats around data access, talent acquisition, etc.)? For products/companies that have ML/AI as their core offering, what are some strategies that they use to compete with "big tech" companies that already have a large corpus of data?What do you see as the societal vs. business importance of open source models as AI becomes more integrated into consumer facing products?What are the most interesting, innovative, or unexpected ways that you have seen ML/AI used in business and social contexts?What are the most interesting, unexpected, or challenging lessons that you have learned while working on the ML/AI elements of the MAD landscape?When is ML/AI the wrong choice for businesses?What are the areas of ML/AI that you are paying closest attention to in your own work?Contact InfoWebsite@mattturck on TwitterParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksMAD LandscapeData Engineering Podcast EpisodeFirst Mark CapitalBayesian TechniquesHadoopChatGPTAutoGPTDataikuGenerative AIDatabricksMLOpsOpenAIAnthropicDeepMindBloombergGPTHuggingFaceJexi Movie"Her" MovieSynthesiaThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Sep 11, 2023 • 50min
Applying Federated Machine Learning To Sensitive Healthcare Data At Rhino Health
SummaryA core challenge of machine learning systems is getting access to quality data. This often means centralizing information in a single system, but that is impractical in highly regulated industries, such as healthchare. To address this hurdle Rhino Health is building a platform for federated learning on health data, so that everyone can maintain data privacy while benefiting from AI capabilities. In this episode Ittai Dayan explains the barriers to ML in healthcare and how they have designed the Rhino platform to overcome them.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Your host is Tobias Macey and today I'm interviewing Ittai Dayan about using federated learning at Rhino Health to bring AI capabilities to the tightly regulated healthcare industryInterviewIntroductionHow did you get involved in machine learning?Can you describe what Rhino Health is and the story behind it?What is federated learning and what are the trade-offs that it introduces? What are the benefits to healthcare and pharmalogical organizations from using federated learning?What are some of the challenges that you face in validating that patient data is properly de-identified in the federated models?Can you describe what the Rhino Health platform offers and how it is implemented? How have the design and goals of the system changed since you started working on it?What are the technological capabilities that are needed for an organization to be able to start using Rhino Health to gain insights into their patient and clinical data? How have you approached the design of your product to reduce the effort to onboard new customers and solutions?What are some examples of the types of automation that you are able to provide to your customers? (e.g. medical diagnosis, radiology review, health outcome predictions, etc.)What are the ethical and regulatory challenges that you have had to address in the development of your platform?What are the most interesting, innovative, or unexpected ways that you have seen Rhino Health used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Rhino Health?When is Rhino Health the wrong choice?What do you have planned for the future of Rhino Health?Contact InfoLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksRhino HealthFederated LearningNvidia ClaraNvidia DGXMelloddyFlair NLPThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jun 17, 2023 • 43min
Using Machine Learning To Keep An Eye On The Planet
SummarySatellite imagery has given us a new perspective on our world, but it is limited by the field of view for the cameras. Synthetic Aperture Radar (SAR) allows for collecting images through clouds and in the dark, giving us a more consistent means of collecting data. In order to identify interesting details in such a vast amount of data it is necessary to use the power of machine learning. ICEYE has a fleet of satellites continuously collecting information about our planet. In this episode Tapio Friberg shares how they are applying ML to that data set to provide useful insights about fires, floods, and other terrestrial phenomena.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Your host is Tobias Macey and today I'm interviewing Tapio Friberg about building machine learning applications on top of SAR (Synthetic Aperture Radar) data to generate insights about our planetInterviewIntroductionHow did you get involved in machine learning?Can you describe what ICEYE is and the story behind it?What are some of the applications of ML at ICEYE?What are some of the ways that SAR data poses a unique challenge to ML applications?What are some of the elements of the ML workflow that you are able to use "off the shelf" and where are the areas that you have had to build custom solutions?Can you share the structure of your engineering team and the role that the ML function plays in the larger organization?What does the end-to-end workflow for your ML model development and deployment look like? What are the operational requirements for your models? (e.g. batch execution, real-time, interactive inference, etc.)In the model definitions, what are the elements of the source domain that create the largest challenges? (e.g. noise from backscatter, variance in resolution, etc.)Once you have an output from an ML model how do you manage mapping between data domains to reflect insights from SAR sources onto a human understandable representation?Given that SAR data and earth imaging is still a very niche domain, how does that influence your ability to hire for open positions and the ways that you think about your contributions to the overall ML ecosystem?How can your work on using SAR as a representation of physical attributes help to improve capabilities in e.g. LIDAR, computer vision, etc.?What are the most interesting, innovative, or unexpected ways that you have seen ICEYE and SAR data used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ML for SAR data?What do you have planned for the future of ML applications at ICEYE?Contact InfoLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksICEYESAR == Synthetic Aperture RadarTransfer LearningThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

May 29, 2023 • 47min
The Role Of Model Development In Machine Learning Systems
Josh Tobin discusses the shift in focus from model development to machine learning systems, the evolution of modeling in the machine learning ecosystem, the capabilities of Gantry in enhancing model performance and maintenance, core capabilities and flexible support for machine learning, innovative approaches and challenges in building and deploying machine learning models, and when to choose Gantry for model development and maintenance.

Mar 9, 2023 • 35min
Real-Time Machine Learning Has Entered The Realm Of The Possible
SummaryMachine learning models have predominantly been built and updated in a batch modality. While this is operationally simpler, it doesn't always provide the best experience or capabilities for end users of the model. Tecton has been investing in the infrastructure and workflows that enable building and updating ML models with real-time data to allow you to react to real-world events as they happen. In this episode CTO Kevin Stumpf explores they benefits of real-time machine learning and the systems that are necessary to support the development and maintenance of those models.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Your host is Tobias Macey and today I'm interviewing Kevin Stumpf about the challenges and promise of real-time ML applicationsInterviewIntroductionHow did you get involved in machine learning?Can you describe what real-time ML is and some examples of where it might be applied?What are the operational and organizational requirements for being able to adopt real-time approaches for ML projects?What are some of the ways that real-time requirements influence the scale/scope/architecture of an ML model?What are some of the failure modes for real-time vs analytical or operational ML?Given the low latency between source/input data being generated or received and a prediction being generated, how does that influence susceptibility to e.g. data drift? Data quality and accuracy also become more critical. What are some of the validation strategies that teams need to consider as they move to real-time?What are the most interesting, innovative, or unexpected ways that you have seen real-time ML applied?What are the most interesting, unexpected, or challenging lessons that you have learned while working on real-time ML systems?When is real-time the wrong choice for ML?What do you have planned for the future of real-time support for ML in Tecton?Contact InfoLinkedIn@kevinmstumpf on TwitterParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksTectonPodcast EpisodeData Engineering Podcast EpisodeUber MichelangeloReinforcement LearningOnline LearningRandom ForestChatGPTXGBoostLinear RegressionTrain-Serve SkewFlinkData Engineering Podcast EpisodeThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Feb 2, 2023 • 1h 6min
How Shopify Built A Machine Learning Platform That Encourages Experimentation
SummaryShopify uses machine learning to power multiple features in their platform. In order to reduce the amount of effort required to develop and deploy models they have invested in building an opinionated platform for their engineers. They have gone through multiple iterations of the platform and their most recent version is called Merlin. In this episode Isaac Vidas shares the use cases that they are optimizing for, how it integrates into the rest of their data platform, and how they have designed it to let machine learning engineers experiment freely and safely.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Your host is Tobias Macey and today I'm interviewing Isaac Vidas about his work on the ML platform used by ShopifyInterviewIntroductionHow did you get involved in machine learning?Can you describe what Shopify is and some of the ways that you are using ML at Shopify? What are the challenges that you have encountered as an organization in applying ML to your business needs?Can you describe how you have designed your current technical platform for supporting ML workloads? Who are the target personas for this platform?What does the workflow look like for a given data scientist/ML engineer/etc.?What are the capabilities that you are trying to optimize for in your current platform? What are some of the previous iterations of ML infrastructure and process that you have built?What are the most useful lessons that you gathered from those previous experiences that informed your current approach?How have the capabilities of the Merlin platform influenced the ways that ML is viewed and applied across Shopify?What are the most interesting, innovative, or unexpected ways that you have seen Merlin used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Merlin?When is Merlin the wrong choice?What do you have planned for the future of Merlin?Contact Info@kazuaros on TwitterLinkedInkazuar on GitHubParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksShopifyShopify MerlinVertex AIscikit-learnXGBoostRayPodcast.__init__ EpisodePySparkGPT-3ChatGPTGoogle AIPyTorchPodcast.__init__ EpisodeDaskModinPodcast.__init__ EpisodeFlinkData Engineering Podcast EpisodeFeast Feature StoreKubernetesThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Jan 24, 2023 • 59min
Applying Machine Learning To The Problem Of Bad Data At Anomalo
SummaryAll data systems are subject to the "garbage in, garbage out" problem. For machine learning applications bad data can lead to unreliable models and unpredictable results. Anomalo is a product designed to alert on bad data by applying machine learning models to various storage and processing systems. In this episode Jeremy Stanley discusses the various challenges that are involved in building useful and reliable machine learning models with unreliable data and the interesting problems that they are solving in the process.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Your host is Tobias Macey and today I'm interviewing Jeremy Stanley about his work at Anomalo, applying ML to the problem of data quality monitoringInterviewIntroductionHow did you get involved in machine learning?Can you describe what Anomalo is and the story behind it?What are some of the ML approaches that you are using to address challenges with data quality/observability?What are some of the difficulties posed by your application of ML technologies on data sets that you don't control? How does the scale and quality of data that you are working with influence/constrain the algorithmic approaches that you are using to build and train your models?How have you implemented the infrastructure and workflows that you are using to support your ML applications?What are some of the ways that you are addressing data quality challenges in your own platform? What are the opportunities that you have for dogfooding your product?What are the most interesting, innovative, or unexpected ways that you have seen Anomalo used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Anomalo?When is Anomalo the wrong choice?What do you have planned for the future of Anomalo?Contact Info@jeremystan on TwitterLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksAnomaloData Engineering Podcast EpisodePartial Differential EquationsNeural NetworkNeural Networks For Pattern Recognition by Christopher M. Bishop (affiliate link)Gradient Boosted Decision TreesShapley ValuesSentrydbtAltairThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

4 snips
Dec 2, 2022 • 46min
Build More Reliable Machine Learning Systems With The Dagster Orchestration Engine
SummaryBuilding a machine learning model one time can be done in an ad-hoc manner, but if you ever want to update it and serve it in production you need a way of repeating a complex sequence of operations. Dagster is an orchestration engine that understands the data that it is manipulating so that you can move beyond coarse task-based representations of your dependencies. In this episode Sandy Ryza explains how his background in machine learning has informed his work on the Dagster project and the foundational principles that it is built on to allow for collaboration across data engineering and machine learning concerns.InterviewIntroductionHow did you get involved in machine learning?Can you start by sharing a definition of "orchestration" in the context of machine learning projects?What is your assessment of the state of the orchestration ecosystem as it pertains to ML?modeling cycles and managing experiment iterations in the execution graphhow to balance flexibility with repeatability What are the most interesting, innovative, or unexpected ways that you have seen orchestration implemented/applied for machine learning?What are the most interesting, unexpected, or challenging lessons that you have learned while working on orchestration of ML workflows?When is Dagster the wrong choice?What do you have planned for the future of ML support in Dagster?Contact InfoLinkedIn@s_ryz on Twittersryza on GitHubParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksDagsterData Engineering Podcast EpisodeClouderaHadoopApache SparkPeter NorvigJosh WillsREPL == Read Eval Print LoopRStudioMemoizationMLFlowKedroData Engineering Podcast EpisodeMetaflowPodcast.__init__ EpisodeKubeflowdbtData Engineering Podcast EpisodeAirbyteData Engineering Podcast EpisodeThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0