MLOps Week 2: Uber's Feature Store and Data Quality with Atindriyo Sanyal, Co-founder of Galileo
Jun 14, 2022
auto_awesome
Atindriyo Sanyal, Co-founder of Galileo, discusses Uber's feature store and data quality in MLOps. Topics include automation in ML model lifecycle, ideal MLOps workflow, experimentation significance, decentralized vs centralized ML infrastructure comparison, evolution of feature stores, and contrasting data quality tools approaches.
MLOps automates ML workflow while providing flexibility and customizability in feature engineering and deployment.
Experimentation is crucial at different stages of ML lifecycle, focusing on optimizing models for specific use cases.
Deep dives
Applying DevOps principles to ML models
MLOps involves bringing the discipline of DevOps in application development to machine learning models. It applies software engineering principles to automate the lifecycle of ML models, from pre-training and feature engineering to deployment and monitoring.
Unique aspects of MLOps compared to DevOps
MLOps has some nuances due to the APIs and libraries involved in building ML models. While similar in implementing software engineering principles, MLOps platforms also provide the right abstractions, APIs, and endpoints for data scientists to build custom ML pipelines and handle complex feature engineering and deployment at scale.
The North Star for MLOps
The ultimate goal for MLOps is to automate and abstract the right parts of the ML workflow while providing flexibility and customizability in other areas. This includes automating feature management, deploying models at scale, and providing centralized monitoring and observability for various metrics like system performance, feature drift, and prediction drift.
Experimentation in the MLOps workflow
Experimentation is crucial throughout the ML lifecycle, occurring at different stages like selecting the right features, training, validation, evaluation, and testing. It focuses more on the left side of the workflow, where data scientists choose data, evaluate test sets, and optimize hyperparameters to develop the best models for their specific use cases.