

MLA 020 Kubeflow and ML Pipeline Orchestration on Kubernetes
Jan 29, 2022
01:08:47
ML Pipelines vs. Standalone Models
- Consider building machine learning pipelines instead of standalone models.
- Pipelines help track model performance, automate retraining, and save time.
Data Changes and Model Retraining
- Data changes over time, especially due to unexpected events like the COVID pandemic.
- Regularly retraining models within pipelines is crucial for maintaining accuracy and relevance.
Kubeflow Overview
- Kubeflow is an open-source ML pipeline orchestrator built on Kubernetes.
- It offers scalability, flexibility, and integrates various ML frameworks.
Get the Snipd Podcast app to discover more snips from this episode
Get the app 1 chevron_right 2 chevron_right 3 chevron_right 4 chevron_right 5 chevron_right 6 chevron_right 7 chevron_right 8 chevron_right 9 chevron_right 10 chevron_right 11 chevron_right 12 chevron_right 13 chevron_right 14 chevron_right 15 chevron_right 16 chevron_right 17 chevron_right 18 chevron_right 19 chevron_right 20 chevron_right 21 chevron_right 22 chevron_right 23 chevron_right 24 chevron_right 25 chevron_right
Introduction
00:00 • 4min
Machine Learning Pipe Line Orchestration Tools
04:14 • 2min
Machine Learning and Pipelines on Cubenetes
06:18 • 2min
Machine Learning Pipe Lines - Is Continuous Integration a Good Idea?
07:52 • 3min
Do Pipe Lines Keep Up With the Times?
11:16 • 2min
Cube Flow - A Solution for Machine Learning Pipelines
12:47 • 2min
CubeFlow vs Cloud Native Machine Learning?
15:02 • 3min
Cloud Development vs Local Development?
18:17 • 3min
Go Cloud
20:56 • 2min
GCP - Is There a Platform for Pipeline Orchestration?
22:32 • 1min
Vertex Ai
24:01 • 4min
Is Tenser Flow Extended a Good Machine Learning Framework?
27:42 • 2min
Using Open Source Tooling for Ma Learning Models
29:40 • 4min
Cube Flow
33:16 • 2min
Do You Use Airflow for Machine Learning?
35:08 • 2min
Cuplo vs TFX - What's the Difference?
36:51 • 2min
Machine Learning on a Cloud Platform?
38:37 • 3min
Using Spot Instances for Model Training
41:58 • 2min
Cloud Computing
43:50 • 4min
Using a Cloud Platform
47:21 • 6min
Is It Better to Use Cube Flow or Vertexi?
53:00 • 3min
Cupe Floe, Tinterflo Standad or Anything Else?
55:36 • 2min
What's the Value of a Master's Degree?
57:21 • 4min
Data Science Master's - Computer Science, Stats or Math?
01:01:29 • 2min
Getting a Master's Degree Online Is a Good Idea
01:03:14 • 5min
Machine learning pipeline orchestration tools, such as SageMaker and Kubeflow, streamline the end-to-end process of data ingestion, model training, deployment, and monitoring, with Kubeflow providing an open-source, cross-cloud platform built atop Kubernetes. Organizations typically choose between cloud-native managed services and open-source solutions based on required flexibility, scalability, integration with existing cloud environments, and vendor lock-in considerations.
Links- Notes and resources at ocdevel.com/mlg/mla-20
- Try a walking desk stay healthy & sharp while you learn & code
Dirk-Jan Verdoorn - Data Scientist at Dept Agency
Managed vs. Open-Source ML Pipeline Orchestration- Cloud providers such as AWS, Google Cloud, and Azure offer managed machine learning orchestration solutions, including SageMaker (AWS) and Vertex AI (GCP).
- Managed services provide integrated environments that are easier to set up and operate but often result in vendor lock-in, limiting portability across cloud platforms.
- Open-source tools like Kubeflow extend Kubernetes to support end-to-end machine learning pipelines, enabling portability across AWS, GCP, Azure, or on-premises environments.
- Kubeflow is an open-source project aimed at making machine learning workflow deployment on Kubernetes simple, portable, and scalable.
- Kubeflow enables data scientists and ML engineers to build, orchestrate, and monitor pipelines using popular frameworks such as TensorFlow, scikit-learn, and PyTorch.
- Kubeflow can integrate with TensorFlow Extended (TFX) for complete end-to-end ML pipelines, covering data ingestion, preprocessing, model training, evaluation, and deployment.
- Production machine learning systems involve not just model training but also complex pipelines for data ingestion, feature engineering, validation, retraining, and monitoring.
- Pipelines automate retraining based on model performance drift or updated data, supporting continuous improvement and adaptation to changing data patterns.
- Scalable, orchestrated pipelines reduce manual overhead, improve reproducibility, and ensure that models remain accurate as underlying business conditions evolve.
- ML pipeline orchestration tools in machine learning fulfill a role similar to continuous integration and continuous deployment (CI/CD) in traditional software engineering.
- Pipelines enable automated retraining, modularization of pipeline steps (such as ingestion, feature transformation, and deployment), and robust monitoring.
- Adopting pipeline orchestrators, rather than maintaining standalone models, helps organizations handle multiple models and varied business use cases efficiently.
- Managed services (e.g., SageMaker, Vertex AI) offer streamlined user experiences and seamless integration but restrict cross-cloud flexibility.
- Kubeflow, as an open-source platform on Kubernetes, enables cross-platform deployment, integration with multiple ML frameworks, and minimizes dependency on a single cloud provider.
- The complexity of Kubernetes and Kubeflow setup is offset by significant flexibility and community-driven improvements.
- Kubeflow operates on any Kubernetes environment including AWS EKS, GCP GKE, and Azure AKS, as well as on-premises or local clusters.
- Local and cross-cloud development are facilitated in Kubeflow, while managed services like SageMaker and Vertex AI are better suited to cloud-native workflows.
- Debugging and development workflows can be challenging in highly secured cloud environments; Kubeflow’s local deployment flexibility addresses these hurdles.
- TensorFlow Extended (TFX) is an end-to-end platform for creating production ML pipelines, tightly integrated with Kubeflow for deployment and execution.
- While Kubeflow originally focused on TensorFlow, it has grown to support PyTorch, scikit-learn, and other major ML frameworks, offering wider applicability.
- TFX provides modular pipeline components (data ingestion, transformation, validation, model training, evaluation, and deployment) that execute within Kubeflow’s orchestration platform.
- Airflow is a general-purpose workflow orchestrator using DAGs, suited for data engineering and automation, but less resource-capable for heavy ML training within the pipeline.
- Airflow often submits jobs to external compute resources (e.g., AI Platform) for resource-intensive workloads.
- In organizations using both Kubeflow and Airflow, Airflow may handle data workflows, while Kubeflow is reserved for ML pipelines.
- MLflow and other solutions also exist, each with unique integrations and strengths; their adoption depends on use case requirements.
- The optimal choice of cloud platform and orchestration tool is typically guided by client needs, existing integrations (e.g., organizational use of Google or Microsoft solutions), and team expertise.
- Agencies with diverse client portfolios often benefit from open-source, cross-cloud tools like Kubeflow to maximize flexibility and knowledge sharing across projects.
- Users entrenched in a single cloud provider may prefer managed offerings for ease of use and integration, while those prioritizing portability and flexibility often choose open-source solutions.
- Both AWS and GCP offer cost-saving compute options for training, such as spot instances (AWS) and preemptible instances (GCP), which are suitable for non-production, batch training jobs.
- Production workloads that require high uptime and reliability do not typically utilize cost-saving transient compute resources, as these can be interrupted.
- Project initiation begins with data discovery and validation of the client’s requirements against available data.
- Cloud environment selection is influenced by client infrastructure, business applications, and platform integrations rather than solely by technical features.
- Data cleaning, exploratory analysis, model prototyping, advanced model refinement, and deployment are handled collaboratively with data engineering and machine learning teams.
- The pipeline is gradually constructed in modular steps, facilitating scalable, automated retraining and integration with business applications.
- Advanced mathematics or statistics education provides a strong foundation for work in data science and machine learning.
- Master’s degrees in data science add the most value for candidates from non-technical undergraduate backgrounds; those with backgrounds in statistics, mathematics, or computer science may benefit more from self-study or targeted upskilling.
- When evaluating online or accelerated degree programs, candidates should scrutinize the curriculum, instructor engagement, and peer interaction to ensure comprehensive learning.