AI Engineering Podcast cover image

AI Engineering Podcast

Latest episodes

undefined
Sep 28, 2022 • 52min

Solve The Cold Start Problem For Machine Learning By Letting Humans Teach The Computer With Aitomatic

SummaryMachine learning is a data-hungry approach to problem solving. Unfortunately, there are a number of problems that would benefit from the automation provided by artificial intelligence capabilities that don’t come with troves of data to build from. Christopher Nguyen and his team at Aitomatic are working to address the "cold start" problem for ML by letting humans generate models by sharing their expertise through natural language. In this episode he explains how that works, the various ways that we can start to layer machine learning capabilities on top of each other, as well as the risks involved in doing so without incorporating lessons learned in the growth of the software industry.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!Your host is Tobias Macey and today I’m interviewing Christopher Nguyen about how to address the cold start problem for ML/AI projectsInterviewIntroductionHow did you get involved in machine learning?Can you describe what the "cold start" or "small data" problem is and its impact on an organization’s ability to invest in machine learning?What are some examples of use cases where ML is a viable solution but there is a corresponding lack of usable data?How does the model design influence the data requirements to build it? (e.g. statistical model vs. deep learning, etc.)What are the available options for addressing a lack of data for ML? What are the characteristics of a given data set that make it suitable for ML use cases?Can you describe what you are building at Aitomatic and how it helps to address the cold start problem? How have the design and goals of the product changed since you first started working on it?What are some of the education challenges that you face when working with organizations to help them understand how to think about ML/AI investment and practical limitations? What are the most interesting, innovative, or unexpected ways that you have seen Aitomatic/H1st used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Aitomatic/H1st?When is a human/knowledge driven approach to ML development the wrong choice?What do you have planned for the future of Aitomatic?Contact InfoLinkedIn@pentagoniac on TwitterGoogle ScholarParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksAitomaticHuman First AIKnowledge First World SymposiumAtari 800Cold start problemScale AISnorkel AIPodcast EpisodeAnomaly DetectionExpert SystemsICML == International Conference on Machine LearningNIST == National Institute of Standards and TechnologyMulti-modal ModelSVM == Support Vector MachineTensorflowPytorchPodcast.__init__ EpisodeOSS CapitalDALL-EThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
undefined
Sep 21, 2022 • 52min

Convert Your Unstructured Data To Embedding Vectors For More Efficient Machine Learning With Towhee

SummaryData is one of the core ingredients for machine learning, but the format in which it is understandable to humans is not a useful representation for models. Embedding vectors are a way to structure data in a way that is native to how models interpret and manipulate information. In this episode Frank Liu shares how the Towhee library simplifies the work of translating your unstructured data assets (e.g. images, audio, video, etc.) into embeddings that you can use efficiently for machine learning, and how it fits into your workflow for model development.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!Your host is Tobias Macey and today I’m interviewing Frank Liu about how to use vector embeddings in your ML projects and how Towhee can reduce the effort involvedInterviewIntroductionHow did you get involved in machine learning?Can you describe what Towhee is and the story behind it?What is the problem that Towhee is aimed at solving?What are the elements of generating vector embeddings that pose the greatest challenge or require the most effort?Once you have an embedding, what are some of the ways that it might be used in a machine learning project? Are there any design considerations that need to be addressed in the form that an embedding takes and how it impacts the resultant model that relies on it? (whether for training or inference)Can you describe how the Towhee framework is implemented? What are some of the interesting engineering challenges that needed to be addressed?How have the design/goals/scope of the project shifted since it began?What is the workflow for someone using Towhee in the context of an ML project?What are some of the types optimizations that you have incorporated into Towhee? What are some of the scaling considerations that users need to be aware of as they increase the volume or complexity of data that they are processing?What are some of the ways that using Towhee impacts the way a data scientist or ML engineer approach the design development of their model code?What are the interfaces available for integrating with and extending Towhee?What are the most interesting, innovative, or unexpected ways that you have seen Towhee used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Towhee?When is Towhee the wrong choice?What do you have planned for the future of Towhee?Contact InfoLinkedInfzliu on GitHubWebsite@frankzliu on TwitterParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksTowheeZillizMilvusData Engineering Podcast EpisodeComputer VisionTensorAutoencoderLatent SpaceDiffusion ModelHSL == Hue, Saturation, LightnessWeights and BiasesThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
undefined
Sep 14, 2022 • 1h 3min

Shedding Light On Silent Model Failures With NannyML

SummaryBecause machine learning models are constantly interacting with inputs from the real world they are subject to a wide variety of failures. The most commonly discussed error condition is concept drift, but there are numerous other ways that things can go wrong. In this episode Wojtek Kuberski explains how NannyML is designed to compare the predicted performance of your model against its actual behavior to identify silent failures and provide context to allow you to determine whether and how urgently to address them.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!Your host is Tobias Macey and today I’m interviewing Wojtek Kuberski about NannyML and the work involved in post-deployment data scienceInterviewIntroductionHow did you get involved in machine learning?Can you describe what NannyML is and the story behind it?What is "post-deployment data science"? How does it differ from the metrics/monitoring approach to managing the model lifecycle?Who is typically responsible for this work? How does NannyML augment their skills?What are some of your experiences with model failure that motivated you to spend your time and focus on this problem?What are the main contributing factors to alert fatigue for ML systems?What are some of the ways that a model can fail silently? How does NannyML detect those conditions?What are the remediation actions that might be necessary once an issue is detected in a model?Can you describe how NannyML is implemented? What are some of the technical and UX design problems that you have had to address?What are some of the ideas/assumptions that you have had to re-evaluate in the process of building NannyML?What additional capabilities are necessary for supporting less structured data?Can you describe what is involved in setting up NannyML and how it fits into an ML engineer’s workflow? Once a model is deployed, what additional outputs/data can/should be collected to improve the utility of NannyML and feed into analysis of the real-world operation?What are the most interesting, innovative, or unexpected ways that you have seen NannyML used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on NannyML?When is NannyML the wrong choice?What do you have planned for the future of NannyML?Contact InfoLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksNannyMLF1 ScoreROC CurveConcept DriftA/B TestingJupyter NotebookVector EmbeddingAirflowEDA == Exploratory Data AnalysisInspired book (affiliate link)ZenMLThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
undefined
Sep 10, 2022 • 54min

How To Design And Build Machine Learning Systems For Reasonable Scale

SummaryUsing machine learning in production requires a sophisticated set of cooperating technologies. A majority of resources that are available for understanding how to design and operate these platforms are focused on either simple examples that don’t scale, or over-engineered technologies designed for the massive scale of big tech companies. In this episode Jacopo Tagliabue shares his vision for "ML at reasonable scale" and how you can adopt these patterns for building your own platforms.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Do you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you.Your host is Tobias Macey and today I’m interviewing Jacopo Tagliabue about building "reasonable scale" ML systemsInterviewIntroductionHow did you get involved in machine learning?How would you describe the current state of the ecosystem for ML practitioners? (e.g. tool selection, availability of information/tutorials, etc.) What are some of the notable changes that you have seen over the past 2 – 5 years?How have the evolutions in the data engineering space been reflected in/influenced the way that ML is being done?What are the challenges/points of friction that ML practitioners have to contend with when trying to get a model into production that isn’t just a toy?You wrote a set of tutorials and accompanying code about performing ML at "reasonable scale". What are you aiming to represent with that phrasing? There is a paradox of choice for any newcomer to ML. What are some of the key capabilities that practitioners should use in their decision rubric when designing a "reasonable scale" system?What are some of the common bottlenecks that crop up when moving from an initial test implementation to a scalable deployment that is serving customer traffic?How much of an impact does the type of ML problem being addressed have on the deployment and scalability elements of the system design? (e.g. NLP vs. computer vision vs. recommender system, etc.)What are some of the misleading pieces of advice that you have seen from "big tech" tutorials about how to do ML that are unnecessary when running at smaller scales?You also spend some time discussing the benefits of a "NoOps" approach to ML deployment. At what point do operations/infrastructure engineers need to get involved? What are the operational aspects of ML applications that infrastructure engineers working in product teams might be unprepared for?What are the most interesting, innovative, or unexpected system designs that you have seen for moderate scale MLOps?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ML system design and implementation?What are the aspects of ML systems design that you are paying attention to in the current ecosystem?What advice do you have for additional references or research that ML practitioners would benefit from when designing their own production systems?Contact Infojacopotagliabue on GitHubWebsiteLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksThe Post-Modern Stack: ML At Reasonable ScaleCoveoNLP == Natural Language ProcessingRecListPart of speech taggingMarkov ModelYDNABB (You Don’t Need A Bigger Boat)dbtData Engineering Podcast EpisodeSeldonMetaflowPodcast.__init__ EpisodeSnowflakeInformation RetrievalModern Data StackSQLiteSpark SQLAWS AthenaKerasPyTorchLuigiAirflowFlaskAWS FargateAWS SagemakerRecommendations At Reasonable ScalePineconeData Engineering Podcast EpisodeRedisKNN == K-Nearest NeighborsPinterest Engineering BlogMaterializeOpenAIThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
undefined
Sep 9, 2022 • 59min

Building A Business Powered By Machine Learning At Assembly AI

SummaryThe increasing sophistication of machine learning has enabled dramatic transformations of businesses and introduced new product categories. At Assembly AI they are offering advanced speech recognition and natural language models as an API service. In this episode founder Dylan Fox discusses the unique challenges of building a business with machine learning as the core product.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!Your host is Tobias Macey and today I’m interviewing Dylan Fox about building and growing a business with ML as its core offeringInterviewIntroductionHow did you get involved in machine learning?Can you describe what Assembly is and the story behind it? For anyone who isn’t familiar with your platform, can you describe the role that ML/AI plays in your product?What was your process for going from idea to prototype for an AI powered business? Can you offer parallels between your own experience and that of your peers who are building businesses oriented more toward pure software applications?How are you structuring your teams?On the path to your current scale and capabilities how have you managed scoping of your model capabilities and operational scale to avoid getting bogged down or burnt out?How do you think about scoping of model functionality to balance composability and system complexity?What is your process for identifying and understanding which problems are suited to ML and when to rely on pure software?You are constantly iterating on model performance and introducing new capabilities. How do you manage prototyping and experimentation cycles? What are the metrics that you track to identify whether and when to move from an experimental to an operational state with a model?What is your process for understanding what’s possible and what can feasibly operate at scale?Can you describe your overall operational patterns delivery process for ML?What are some of the most useful investments in tooling that you have made to manage development experience for your teams?Once you have a model in operation, how do you manage performance tuning? (from both a model and an operational scalability perspective)What are the most interesting, innovative, or unexpected aspects of ML development and maintenance that you have encountered while building and growing the Assembly platform?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Assembly?When is ML the wrong choice?What do you have planned for the future of Assembly?Contact Info@YouveGotFox on TwitterLinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksAssembly AIPodcast.__init__ EpisodeLearn Python the Hard WayNLTKNLP == Natural Language ProcessingNLU == Natural Language UnderstandingSpeech RecognitionTensorflowr/machinelearningSciPyPyTorchJaxHuggingFaceRNN == Recurrent Neural NetworkCNN == Convolutional Neural NetworkLSTM == Long Short Term MemoryHidden Markov ModelsBaidu DeepSpeechCTC (Connectionist Temporal Classification) Loss ModelTwilioGrid SearchK80 GPUA100 GPUTPU == Tensor Processing UnitFoundation ModelsBLOOM Language ModelDALL-E 2The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
undefined
Aug 26, 2022 • 1h 15min

Update Your Model's View Of The World In Real Time With Streaming Machine Learning Using River

SummaryThe majority of machine learning projects that you read about or work on are built around batch processes. The model is trained, and then validated, and then deployed, with each step being a discrete and isolated task. Unfortunately, the real world is rarely static, leading to concept drift and model failures. River is a framework for building streaming machine learning projects that can constantly adapt to new information. In this episode Max Halford explains how the project works, why you might (or might not) want to consider streaming ML, and how to get started building with River.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!Your host is Tobias Macey and today I’m interviewing Max Halford about River, a Python toolkit for streaming and online machine learningInterviewIntroductionHow did you get involved in machine learning?Can you describe what River is and the story behind it?What is "online" machine learning? What are the practical differences with batch ML?Why is batch learning so predominant?What are the cases where someone would want/need to use online or streaming ML?The prevailing pattern for batch ML model lifecycles is to train, deploy, monitor, repeat. What does the ongoing maintenance for a streaming ML model look like? Concept drift is typically due to a discrepancy between the data used to train a model and the actual data being observed. How does the use of online learning affect the incidence of drift?Can you describe how the River framework is implemented? How have the design and goals of the project changed since you started working on it?How do the internal representations of the model differ from batch learning to allow for incremental updates to the model state?In the documentation you note the use of Python dictionaries for state management and the flexibility offered by that choice. What are the benefits and potential pitfalls of that decision?Can you describe the process of using River to design, implement, and validate a streaming ML model? What are the operational requirements for deploying and serving the model once it has been developed?What are some of the challenges that users of River might run into if they are coming from a batch learning background?What are the most interesting, innovative, or unexpected ways that you have seen River used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on River?When is River the wrong choice?What do you have planned for the future of River?Contact InfoEmail@halford_max on TwitterMaxHalford on GitHubParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksRiverscikit-multiflowFederated Machine LearningHogwild! Google PaperChip Huyen concept drift blog postDan Crenshaw Berkeley Clipper MLOpsRobustness PrincipleNY Taxi DatasetRiverTorchRiver Public RoadmapBeaver tool for deploying online modelsProdigy ML human in the loop labelingThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
undefined
Aug 16, 2022 • 1h 8min

Using AI To Transform Your Business Without The Headache Using Graft

SummaryMachine learning is a transformative tool for the organizations that can take advantage of it. While the frameworks and platforms for building machine learning applications are becoming more powerful and broadly available, there is still a significant investment of time, money, and talent required to take full advantage of it. In order to reduce that barrier further Adam Oliner and Brian Calvert, along with their other co-founders, started Graft. In this episode Adam and Brian explain how they have built a platform designed to empower everyone in the business to take part in designing and building ML projects, while managing the end-to-end workflow required to go from data to production.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!Your host is Tobias Macey and today I’m interviewing Brian Calvert and Adam Oliner about Graft, a cloud-native platform designed to simplify the work of applying AI to business problemsInterviewIntroductionHow did you get involved in machine learning?Can you describe what Graft is and the story behind it?What is the core thesis of the problem you are targeting? How does the Graft product address that problem?Who are the personas that you are focused on working with both now in your early stages and in the future as you evolve the product?What are the capabilities that can be unlocked in different organizations by reducing the friction and up-front investment required to adopt ML/AI? What are the user-facing interfaces that you are focused on providing to make that adoption curve as shallow as possible? What are some of the unavoidable bits of complexity that need to be surfaced to the end user?Can you describe the infrastructure and platform design that you are relying on for the Graft product? What are some of the emerging "best practices" around ML/AI that you have been able to build on top of? As new techniques and practices are discovered/introduced how are you thinking about the adoption process and how/when to integrate them into the Graft product?What are some of the new engineering challenges that you have had to tackle as a result of your specific product?Machine learning can be a very data and compute intensive endeavor. How are you thinking about scalability in a multi-tenant system? Different model and data types can be widely divergent in terms of the cost (monetary, time, compute, etc.) required. How are you thinking about amortizing vs. passing through those costs to the end user?Can you describe the adoption/integration process for someone using Graft? Once they are onboarded and they have connected to their various data sources, what is the workflow for someone to apply ML capabilities to their problems?One of the challenges about the current state of ML capabilities and adoption is understanding what is possible and what is impractical. How have you designed Graft to help identify and expose opportunities for applying ML within the organization?What are some of the challenges of customer education and overall messaging that you are working through?What are the most interesting, innovative, or unexpected ways that you have seen Graft used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Graft?When is Graft the wrong choice?What do you have planned for the future of Graft?Contact InfoBrian LinkedInAdam LinkedInParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksGraftHigh Energy Particle PhysicsLHCCruiseSlackSplunkMarvin MinskyPatrick Henry WinstonAI WinterSebastian ThrunDARPA Grand ChallengeHigss BosonSupersymmetryKinematicsTransfer LearningFoundation ModelsML EmbeddingsBERTAirflowDagsterPrefectDaskKubeflowMySQLPostgreSQLSnowflakeRedshiftS3KubernetesMulti-modal modelsMulti-task modelsMagic: The GatheringThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/?utm_source=rss&utm_medium=rss
undefined
4 snips
Aug 6, 2022 • 51min

Accelerate Development And Delivery Of Your Machine Learning Projects With A Comprehensive Feature Platform

SummaryIn order for a machine learning model to build connections and context across the data that is fed into it the raw data needs to be engineered into semantic features. This is a process that can be tedious and full of toil, requiring constant upkeep and often leading to rework across projects and teams. In order to reduce the amount of wasted effort and speed up experimentation and training iterations a new generation of services are being developed. Tecton first built a feature store to serve as a central repository of engineered features and keep them up to date for training and inference. Since then they have expanded the set of tools and services to be a full-fledged feature platform. In this episode Kevin Stumpf explains the different capabilities and activities related to features that are necessary to maintain velocity in your machine learning projects.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!Do you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you.Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!Your host is Tobias Macey and today I’m interviewing Kevin Stumpf about the role of feature platforms in your ML engineering workflowInterviewIntroductionHow did you get involved in machine learning?Can you describe what you mean by the term "feature platform"? What are the components and supporting capabilities that are needed for such a platform?How does the availability of engineered features impact the ability of an organization to put ML into production?What are the points of friction that teams encounter when trying to build and maintain ML projects in the absence of a fully integrated feature platform?Who are the target personas for the Tecton platform? What stages of the ML lifecycle does it address?Can you describe how you have designed the Tecton feature platform? How have the goals and capabilities of the product evolved since you started working on it?What is the workflow for an ML engineer or data scientist to build and maintain features and use them in the model development workflow?What are the responsibilities of the MLOps stack that you have intentionally decided not to address? What are the interfaces and extension points that you offer for integrating with the other utilities needed to manage a full ML system?You wrote a post about the need to establish a DevOps approach to ML data. In keeping with that theme, can you describe how to think about the approach to testing and validation techniques for features and their outputs?What are the most interesting, innovative, or unexpected ways that you have seen Tecton/Feast used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Tecton?When is Tecton the wrong choice?What do you have planned for the future of the Tecton feature platform?Contact InfoLinkedIn@kevinmstumpf on Twitterkevinstumpf on GitHubParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?LinksTectonData Engineering Podcast EpisodeUber MichaelangeloFeature StoreSnowflakeData Engineering Podcast EpisodeDynamoDBTrain/Serve SkewLambda ArchitectureRedisThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/?utm_source=rss&utm_medium=rss
undefined
Jul 29, 2022 • 54min

Build Better Models Through Data Centric Machine Learning Development With Snorkel AI

The podcast discusses the challenges of data-centric machine learning development and how Snorkel AI's platform reduces the time and cost of building training datasets. They explore the concept of dark data, the complexity of working with different data types, and the limitations of Snorkel AI. The podcast also covers the transition from research to building a business, the biggest barrier to machine learning adoption, and the importance of properly handling data in enabling machine learning applications.
undefined
Jul 21, 2022 • 1h

Declarative Machine Learning For High Performance Deep Learning Models With Predibase

SummaryDeep learning is a revolutionary category of machine learning that accelerates our ability to build powerful inference models. Along with that power comes a great deal of complexity in determining what neural architectures are best suited to a given task, engineering features, scaling computation, etc. Predibase is building on the successes of the Ludwig framework for declarative deep learning and Horovod for horizontally distributing model training. In this episode CTO and co-founder of Predibase, Travis Addair, explains how they are reducing the burden of model development even further with their managed service for declarative and low-code ML and how they are integrating with the growing ecosystem of solutions for the full ML lifecycle.AnnouncementsHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!Do you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you.Your host is Tobias Macey and today I’m interviewing Travis Addair about Predibase, a low-code platform for building ML models in a declarative formatInterviewIntroductionHow did you get involved in machine learning?Can you describe what Predibase is and the story behind it?Who is your target audience and how does that focus influence your user experience and feature development priorities?How would you describe the semantic differences between your chosen terminology of "declarative ML" and the "autoML" nomenclature that many projects and products have adopted? Another platform that launched recently with a promise of "declarative ML" is Continual. How would you characterize your relative strengths?Can you describe how the Predibase platform is implemented? How have the design and goals of the product changed as you worked through the initial implementation and started working with early customers?The operational aspects of the ML lifecycle are still fairly nascent. How have you thought about the boundaries for your product to avoid getting drawn into scope creep while providing a happy path to delivery?Ludwig is a core element of your platform. What are the other capabilities that you are layering around and on top of it to build a differentiated product?In addition to the existing interfaces for Ludwig you created a new language in the form of PQL. What was the motivation for that decision? How did you approach the semantic and syntactic design of the dialect?What is your vision for PQL in the space of "declarative ML" that you are working to define?Can you describe the available workflows for an individual or team that is using Predibase for prototyping and validating an ML model? Once a model has been deemed satisfactory, what is the path to production?How are you approaching governance and sustainability of Ludwig and Horovod while balancing your reliance on them in Predibase?What are some of the notable investments/improvements that you have made in Ludwig during your work of building Predibase?What are the most interesting, innovative, or unexpected ways that you have seen Predibase used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Predibase?When is Predibase the wrong choice?What do you have planned for the future of Predibase?Contact InfoLinkedIntgaddair on GitHub@travisaddair on TwitterParting QuestionFrom your perspective, what is the biggest barrier to adoption of machine learning today?Closing AnnouncementsThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workersLinksPredibaseHorovodLudwigPodcast.__init__ EpisodeSupport Vector MachineHadoopTensorflowUber MichaelangeloAutoMLSpark ML LibDeep LearningPyTorchContinualData Engineering Podcast EpisodeOvertonKubernetesRayNvidia TritonWhylogsData Engineering Podcast EpisodeWeights and BiasesMLFlowCometConfusion MatricesdbtData Engineering Podcast EpisodeTorchscriptSelf-supervised LearningThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app