MLOps.community  cover image

MLOps.community

Latest episodes

undefined
Nov 3, 2020 • 1h 1min

Analyzing the Google Paper on Continuous Delivery in ML // Part 4 // MLOps Coffee Sessions #17

MLOps level 2: CI/CD pipeline automation For a rapid and reliable update of the pipelines in production, you need a robust automated CI/CD system. This automated CI/CD system lets your data scientists rapidly explore new ideas around feature engineering, model architecture, and hyperparameters. They can implement these ideas and automatically build, test, and deploy the new pipeline components to the target environment. Figure 4. CI/CD and automated ML pipeline.   This MLOps setup includes the following components:   Source control Test and build services Deployment services Model registry Feature store ML metadata store ML pipeline orchestrator Characteristics of stages discussion.   Figure 5. Stages of the CI/CD automated ML pipeline.   The pipeline consists of the following stages: Development and experimentation: You iteratively try out new ML algorithms and new modelling where the experiment steps are orchestrated. The output of this stage is the source code of the ML pipeline steps that are then pushed to a source repository.   Pipeline continuous integration: You build source code and run various tests. The outputs of this stage are pipeline components (packages, executables, and artefacts) to be deployed in a later stage.   Pipeline continuous delivery: You deploy the artefacts produced by the CI stage to the target environment. The output of this stage is a deployed pipeline with the new implementation of the model.   Automated triggering: The pipeline is automatically executed in production based on a schedule or in response to a trigger. The output of this stage is a trained model that is pushed to the model registry.   Model continuous delivery: You serve the trained model as a prediction service for the predictions. The output of this stage is a deployed model prediction service.   Monitoring: You collect statistics on the model performance based on live data. The output of this stage is a trigger to execute the pipeline or to execute a new experiment cycle.  The data analysis step is still a manual process for data scientists before the pipeline starts a new iteration of the experiment. The model analysis step is also a manual process. Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/
undefined
Oct 30, 2020 • 58min

Hands-on serving models using KFserving // Theofilos Papapanagiotou // Data Science Architect at Prosus // MLOps Meetup #40

MLOps community meetup #40! Last Wednesday, we talked to Theofilos Papapanagiotou, Data Science Architect at Prosus, about Hands-on Serving Models Using KFserving. // Abstract: We looked to some popular model formats like the SavedModel of Tensorflow, the Model Archiver of PyTorch, pickle&ONNX, to understand how the weights of the NN are saved there, the graph, and the signature concepts. We discussed the relevant resources of the deployment stack of Istio (the Ingress gateway, the sidecar and the virtual service) and Knative (the service and revisions), as well as Kubeflow and KFServing. Then we got into the design details of KFServing, its custom resources, the controller and webhooks, the logging, and configuration. We spent a large part in the monitoring stack, the metrics of the servable (memory footprint, latency, number of requests), as well as the model metrics like the graph, init/restore latencies, the optimizations, and the runtime metrics which end up to Prometheus. We looked at the inference payload and prediction logging to observe drifts and trigger the retraining of the pipeline. Finally, a few words about the awesome community and the roadmap of the project on multi-model serving and inference routing graph. // Bio: Theo is a recovering Unix Engineer with 20 years of work experience in Telcos, on internet services, video delivery, and cybersecurity. He is also a university student for life; BSc in CS 1999, MSc in Data Coms 2008, and MSc in AI 2017. Nowadays he calls himself an ML Engineer, as he expresses through this role his passion for System Engineering and Machine Learning. His analytical thinking is driven by curiosity and hacker spirit. He has skills that span a variety of different areas: Statistics, Programming, Databases, Distributed Systems, and Visualization. ----------- Connect With Us ✌️-------------   Join our Slack community:  https://go.mlops.community/slack Follow us on Twitter:  @mlopscommunity Sign up for the next meetup:  https://go.mlops.community/register Connect with Demetrios on LinkedIn:  https://www.linkedin.com/in/dpbrinkm/ Connect with Theofilos on LinkedIn:  https://linkedin.com/in/theofpa
undefined
Oct 27, 2020 • 57min

Operationalize Open Source Models with SAS Open Model Manager // Ivan Nardini // Customer Engineer at SAS // MLOps Meetup #39

MLOps community meetup #39! Last week we talked to Ivan Nardini, Customer Engineer at SAS, about Operationalize Open Source Models with SAS Open Model Manager.   // Abstract: Analytics are Open.   According to their nature, Open Source technologies allows an agile development of the models, but it results difficult to put them in production.  The goal of SAS is supporting customers in operationalize analytics  In this meetup, I present SAS Open Model Manager, a containerized Modelops tool that accelerates deployment processes and, once in production, allows monitoring your models (SAS and Open Source).   // Bio: As a member of Pre-Sales CI & Analytics Support Team, I'm specialized in ModelOps and Decisioning. I've been involved in operationalizing analytics using different Open Source technologies in a variety of industries. My focus is on providing solutions to deploy, monitor and govern models in production and optimize business decisions processes. To reach this goal, I work with software technologies (SAS Viya platform, Container, CI/CD tools) and Cloud (AWS).   //Other Links you can check Ivan on: https://medium.com/@ivannardini ----------- Connect With Us ✌️-------------   Join our Slack community:   https://go.mlops.community/slack Follow us on Twitter:   @mlopscommunity Sign up for the next meetup:   https://go.mlops.community/register Connect with Demetrios on LinkedIn:   https://www.linkedin.com/in/dpbrinkm/ Connect with Ivan on LinkedIn:   https://www.linkedin.com/in/ivan-nardiniDescription Timestamps: 0:00 - Intro to Ivan Nardini 3:41 - Operationalize Open Source Models with SAS Open Model Manager slide 4:21 - Agenda 5:01 - What is ModelOps and what is the difference between MLOps and ModelOps? 6:19 - "Do I look like an expert?" Ivan's Background 7:12 - Why ModelOps? 7:20 - Operationalizing Analytics 8:12 - Operationalizing Analytics: SAS 9:08 - Operationalizing Analytics: Customer 11:36 - What's a model for you? 12:07 - Hidden Complexity in ML Systems 12:52 - Hidden Complexity in ML Systems: Business Prospective 14:12 - Hidden Complexity in ML Systems: IT Prospective 17:12 - One of the hardest things is Security? 17:52 - Hidden Complexity in ML Systems: Analytics Prospective 19:20 - Why ModelOps? 20:09 - ModelOps technologies Map 22:29 - Customers ModelOps Maturity over Technology Propensity. MLOps Maturity vs. Technology Propensity 26:23 - Show us your Analytical Models 26:56 - SAS can support you to ship them in production providing Governance and Decisioning. 27:28 - When you talk to people, is there something that you feel like there is a unified model, but focusing on the wrong thing? 29:14 - Have you seen Reproducibility and Governance? 30:47 - Advertising Time 30:55 - Operationalize Open Source Models with SAS Open Model Manager 31:02 - ModelOps with SAS 32:06 - SAS Open Model Manager 33:18 - Demo 33:27 - SAS Model Ops Architecture - Classification Model 35:02 - Model Demo: Credit Scoring Business Application 50:20 - Take Homes 50:24 - Operationalize Analytics   50:32 - Model Lifecycle Effort Side 51:20 - Business Value Side 51:47 - Typical Analytics Operationalization Graph 52:18 - Analytics Operationalization with ModelOps Graph 53:18 - Is this for everybody?
undefined
Oct 26, 2020 • 57min

Machine in Production = Data Engineering + ML + Software Engineering // Satish Chandra Gupta // MLOps Coffee Sessions #16

//Bio Satish built compilers, profilers, IDEs, and other dev tools for over a decade. At Microsoft Research, he saw his colleagues solving hard program analysis problems using Machine Learning. That is when he got curious and started learning. His approach to ML is influenced by his software engineering background of building things for production.   He has a keen interest in doing ML in production, which is a lot more than training and tuning the models. The first step is to understand the product and business context, then building an efficient pipeline, then training models, and finally monitoring its efficacy and impact on the business.  He considers ML as another tool in the software engineering toolbox, albeit a very powerful one.  He is a co-founder of Slang Labs, a Voice Assistant as a Service platform for building in-app voice assistants.   //Talk Takeaways ML-driven product features will grow manifold. Organizations take an evolutionary approach to absorb tech innovations. ML will be no exception. How Organizations adopted cloud can offer useful lessons. ML/DS folks who invest in an understanding business context and tech environment of the org will make a bigger impact. Organizations that invest in data infrastructure will be more successful in extracting value from machine learning.   //Other links you can check Satish on An Engineer’s trek into Machine Learning:   https://scgupta.link/ml-intro-for-developers Architecture for High-Throughput Low-Latency Big Data Pipeline on Cloud: https://scgupta.link/big-data-pipeline-architecture Data pipeline article: https://scgupta.link/big-data-pipeline-architecture or https://towardsdatascience.com/scalable-efficient-big-data-analytics-machine-learning-pipeline-architecture-on-cloud-4d59efc092b5 Tips for software engineers based on my experience of getting into ML: https://scgupta.link/ml-intro-for-developers or https://towardsdatascience.com/software-engineers-trek-into-machine-learning-46b45895d9e0 Linkedin: https://www.linkedin.com/in/scgupta Twitter: https://twitter.com/scgupta Personal Website: http://scgupta.me Company Website: https://slanglabs.in Voice Assistants info: https://www.slanglabs.in/voice-assistants Timestamps: 0:00 - Intro to Satish Chandra Gupta 1:05 - Background of Satish on Machine Learning 3:29 - Satish's background on what he's doing now 5:34 - Why were you interested in the challenges of the workload? 9:53 - As you're looking at the data pipeline, do you see much overlap there? 15:38 - Relationships between engineering pipeline characteristics and how they relate to data. 20:24 - Tips for saving when you're building these pipeline. 24:44 - First point of engagement: Collection 31:26 - Possibilities of Data Architecture 38:03 - Why is it beneficial to save money? 44:22 - Learnings of Satish with his current project, Voice Assistant as a service.
undefined
Oct 20, 2020 • 1h 2min

MLOps + Machine Learning // James Sutton // MLOps Coffee Sessions #15

James Sutton is an ML Engineer focused on helping enterprise bridge the gap between what they have now, and where they need to be to enable production scale ML deployments. ----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/ Connect with James on LinkedIn: https://www.linkedin.com/in/jamessutton2/ Timestamps: 0:00 - Intro to Speaker 2:20 - Scope of the coffee session 3:10 - Background of James Sutton 8:28 - One-shots Classifier Algorithm    12:46 - Why is it a challenge from the engineering perspective with deployment? 19:20 - How to overcome bottlenecks? 30:07 - Vision of your landscape?   34:45 - Maturity playout 38:48 - Maturity perspective of ML 41:49 - Risk of overgeneralizing system designs patterns 46:10 - Reliability, Speed, Cost 46:46 - Consistency, Availability, Partition Tolerance (CAP Theorem) 47:36 - How do you go about discussing these tradeoffs with your clients? 51: 23 - How would you deal with the PII? 58:50 - Collaborative process with clients 1:00:55 - Wrap up
undefined
Oct 19, 2020 • 57min

Scalable Python for Everyone, Everywhere // Matthew Rocklin // MLOps Meetup #38

Parallel Computing with Dask and Coiled Python makes data science and machine learning accessible to millions of people around the world. However, historically Python hasn't handled parallel computing well, which leads to issues as researchers try to tackle problems on increasingly large datasets.  Dask is an open source Python library that enables the existing Python data science stack (Numpy, Pandas, Scikit-Learn, Jupyter, ...) with parallel and distributed computing. Today Dask has been broadly adopted by most major Python libraries, and is maintained by a robust open source community across the world.   This talk discusses parallel computing generally, Dask's approach to parallelizing an existing ecosystem of software, and some of the challenges we've seen in deploying distributed systems. Finally, we also addressed the challenges of robustly deploying distributed systems, which ends up being one of the main accessibility challenges for users today. We hope that by the end of the meetup attendees will better understand parallel computing, have built intuition around how Dask works, and have the opportunity to play with their own Dask cluster on the cloud. Matthew is an open source software developer in the numeric Python ecosystem. He maintains several PyData libraries, but today focuses mostly on Dask a library for scalable computing. Matthew worked for Anaconda Inc for several years, then built out the Dask team at NVIDIA for RAPIDS, and most recently founded Coiled Computing to improve Python's scalability with Dask for large organizations. Matthew has given talks at a variety of technical, academic, and industry conferences.  A list of talks and keynotes is available at (https://matthewrocklin.com/talks). Matthew holds a bachelor’s degree from UC Berkeley in physics and mathematics, and a PhD in computer science from the University of Chicago. Check out our posts here to get more context around where we're coming from: https://medium.com/coiled-hq/coiled-dask-for-everyone-everywhere-376f5de0eff4 https://medium.com/coiled-hq/the-unbearable-challenges-of-data-science-at-scale-83d294fa67f8 ----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/ Connect with Matthew on LinkedIn: https://www.linkedin.com/in/matthew-rocklin-461b4323/
undefined
Oct 18, 2020 • 1h 1min

MLOps Coffee Sessions #13 How to Choose the Right Machine Learning Tool: A Conversation // Jose Navarro and Mariya Davydova

This time we talked about one of the most vibrant questions for any MLOps practitioner: how to choose the right tools for your ML team, given the huge amount of open-source and proprietary MLOps tools available on the market today.  We discussed several criteria to rely on when choosing a tool, including: - The requirements of the particular team use-cases - The scaling capacity of the tool - The cost of migration from a chosen tool - The cost of teaching the team to use this tool - The company or the community behind the tool Apart from that, we talked about particular use-cases and discussed the trade-offs between waiting for a new release of your tool to get the missing piece of functionality, switching to another tool, and building an in-house solution.  We also touched the topic of organising MLOps teams and practices across large companies with a lot of ML teams. // Bio: Jose Navarro Jose Navarro is a Machine Learning Infrastructure Engineer making everyday cooking fun at Cookpad, where its recipe platform has more than 40 million monthly users. He holds a MSc in Machine Learning and High-Performance Computing from the University of Bristol. He is interested in Cloud Native technologies, serverless, and event-driven architecture.    Mariya Davidova Mariya came to MLOps from a software development background. She started her career as a Java developer in JetBrains in 2011, then gradually moved to developer advocacy for JS-based APIs. In 2019, she joined Neu.ro as a platform developer advocate and then moved to the product management position.   Mariya has been obsessed with AI and ML for many years: she finished a bunch of courses, read a lot of books, and even wrote a couple of fiction stories about AI. She believes that proper tooling and decent development and operations practices are essential success component for ML projects, as well as they are for traditional SD. ----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/ Connect with Jose on LinkedIn: https://www.linkedin.com/in/jose-navarro-2a57b612/ Connect with Maria on LinedIn: https://www.linkedin.com/in/mariya-davydova/
undefined
Oct 12, 2020 • 57min

MLOps Coffee Sessions #14 Conversation with the Creators of Dask // Hugo Bowne-Anderson and Matthew Rocklin

Dask What is it? Parallelism for analytics What is parallelism? Doing a lot at once by splitting tasks into smaller subtasks which can be processed in parallel (at the same time) Distributed work across multiple machines and then combining the results Helpful for CPU bound - doing a bunch of calculations on the CPU. The rate at which process progresses is limited by the speed of the CPU Concurrency? Similar but a but things don’t have to happen at the same time, they can happen asynchronously. They can overlap. Shared state Helpful to I/O bound - networking, reading from disk, etc. The rate at which a process progresses is limited by the speed of the I/O subsystem. Multi-core vs distributed Multi-core is a single processor with 2 or more cores that can cooperate through threads - multithreading Distributed is across multiple nodes communicating via HTTP or RPC Why is this hard? Python has it challenges due to GIL, other languages don't have this problem Shared state can lead to potential race conditions, deadlocks, etc Coordination work across the machines For analytics? Calculating some statistics on a large dataset can be tricky if it can’t fit in memory // Show Notes Coiled Cloud: https://cloud.coiled.io/ Coiled Launch Announcement: https://medium.com/coiled-hq/coiled-dask-for-everyone-everywhere-376f5de0eff4 OSS article: https://www.forbes.com/sites/glennsolomon/2020/09/15/monetizing-open-source-business-models-that-generate-billions/#2862e47234fd Amish barn raising: https://www.youtube.com/watch?v=y1CPO4R8o5M MessagePassingInterface: https://en.wikipedia.org/wiki/Message_Passing_Interface ----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/ Connect with Matthew on LinkedIn: https://www.linkedin.com/in/matthew-rocklin-461b4323/ Timestamps: 0:00 - Intro to Matthew Rocklin and Hugo Bowne-Anderson 0:37 - Matthew Rocklin's Background 1:17 - Hugo Brown-Anderson's Background 3:47 - Where did that inspiration come from? 10:04 - Is there a close relationship between Best Practices and Tooling or are these two separate things? 11:27 - Why is Data Literacy important with Coiled? 14:46 - How do you think about the balance between enabling Data Science to have a lot of powerful compute? 17:05 - Machine Learning as a space for tracking best practices experimentation 19:32 - What makes Data Science so difficult?   24:07 - How can a for-profit company compliment Open Source Software (OSS) 29:40 - Amazon becoming a competitor with your own open-source technology (?) 32:50 - How do you encourage more people to contribute and ensure quality? 34:58 - Do you see Coiled operating within the DASK ecosystem? 37:30 - What is DASK? 39:19 - What should people know about parallelism? 41:28 - Why is it so hard to put things back together? 41:34 - Why does Python need a whole new tool to enable that? Or maybe some other tools as well? 44:44 - Dynamic Tasks Scheduling as being useful to Data Scientists 47:15 - Why is reliability in particular important in Data Science? 52:27 - What's in store for DASK?
undefined
Oct 10, 2020 • 1h 5min

MLOps Coffee Sessions #12: Journey of Flyte at Lyft and Through Open-source // Ketan Umare

Why was Flyte built at Lyft? What sorts of requirements does a ML infrastructure team have at lyft? What problems does it solve / use cases? Where does it fit in in the ML and Data ecosystem? What is the vision? Who should consider using it? Learnings as the engineering team tried to bootstrap an open-source community. Ketan Umare is a senior staff software engineer at Lyft responsible for technical direction of the Machine Learning Platform and is a founder of the Flyte project. Before Flyte he worked on ETA, routing and mapping infrastructure at Lyft. He is also the founder of Flink Kubernetes operator and contributor to Spark on kubernetes. Prior to Lyft he was a founding member of Oracle Baremetal Cloud and lead teams building Elastic Block Storage. Prior to that, he started and lead multiple teams in Mapping and Transportation optimization infrastructure at Amazon. He received his Masters in Computer Science from Georgia Tech specializing in High-performance computing and his Bachelors in Engineering in Computer Science from VJTI Mumbai. Besides work, he enjoys spending time with his daughter and wife. He loves the Pacific Northwest outdoors and will try anything new. Lyft Pricing, Locations, Estimated Time of Arrivals (ETA), Mapping, Self-Driving (L5), etc. What sort of scale, storage, network bandwidth are we looking at? Tens of thousands of workflows, hundreds of thousands of executions, millions of tasks, and tens of millions of containers! Flyte: more than 900k workflow executed a month and more than 30+ million container executions per month Typical flow of information? What are the user stories you’re typically dealing with at lyft? How do you set it up? On-prem, cloud, etc. Helm installable? Why Golang? What problems does it solve? Complex data dependencies? Why Orchestrated compute on demand Reuse and sharing Key features Multi-tenant, hosted, serverless Parametrized, data lineage, caching Additionally, if the run invokes a task that has already been computed before, regardless of who executed it, Flyte will smartly use the cached output, saving you both time and money. Versioning, sharing Modular, loosely coupled Seems like you guys recognize that the best task for the job might be hosted elsewhere, so it was important to integrate other solutions into flyte. Flyte extensions Backend plugins - is it true you can create and manage k8s resources like CRDs for things like spark, sagemaker, bigquery? Drop a Star https://flyte.org Flyte community ----------- Connect With Us ✌️------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Ketan on LinkedIn: https://www.linkedin.com/in/ketanumare/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/
undefined
Oct 4, 2020 • 1h 6min

MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3

Round 3 analyzing the Google paper "Continuous Delivery and Automation Pipelines in ML" // Show Notes Data Science Steps for ML Data extraction: You select and integrate the relevant data from various data sources for the ML task. Data analysis: You perform exploratory data analysis (EDA) to understand the available data for building the ML model. This process leads to the following: Understanding the data schema and characteristics that are expected by the model. Identifying the data preparation and feature engineering that are needed for the model. Data preparation: The data is prepared for the ML task. This preparation involves data cleaning, where you split the data into training, validation, and test sets. You also apply data transformations and feature engineering to the model that solves the target task. The output of this steps are the data splits in the prepared format. Model training: The data scientist implements different algorithms with the prepared data to train various ML models. In addition, you subject the implemented algorithms to hyperparameter tuning to get the best performing ML model. The output of this step is a trained model. Model evaluation: The model is evaluated on a holdout test set to evaluate the model quality. The output of this step is a set of metrics to assess the quality of the model. Model validation: The model is confirmed to be adequate for deployment—that its predictive performance is better than a certain baseline. Model serving: The validated model is deployed to a target environment to serve predictions. This deployment can be one of the following: Microservices with a REST API to serve online predictions. An embedded model to an edge or mobile device. Part of a batch prediction system. Model monitoring: The model predictive performance is monitored to potentially invoke a new iteration in the ML process. The level of automation of these steps defines the maturity of the ML process, which reflects the velocity of training new models given new data or training new models given new implementations. The following sections describe three levels of MLOps, starting from the most common level, which involves no automation, up to automating both ML and CI/CD pipelines. In the rest of the conversation, we talk about maturity levels 0 and 1. Next session we will talk about Level 2. Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode