

171: Machine Learning Pipelines Are Still Data Pipelines with Sandy Ryza of Dagster
Jan 3, 2024
Guest Sandy Ryza, an expert in machine learning pipelines, discusses the role of orchestrators in the lifecycle of data, changes in data ops and MLOps, data cleaning, and the overview of Dagster. They also explore the difference between data assets and tasks in data pipelines, defining lineage and data assets in Dagster, and the benefits of a unified orchestration framework. Additionally, they touch on orchestration in the development phase and the emergence of the analytics engineer role.
Chapters
Transcript
Episode notes
1 2 3 4 5 6
Introduction
00:00 • 4min
The Intersection of Data Ops and ML Ops
03:32 • 10min
DAGs and Data Assets in Data Pipelines
13:53 • 20min
Reasons to Use DAGSTER and DBT in Data Transformations
33:45 • 3min
Orchestrating ML and Data Processing Pipelines
36:31 • 17min
Enabling Collaboration and Breaking Technical Barriers in Machine Learning Pipelines
53:49 • 2min