171: Machine Learning Pipelines Are Still Data Pipelines with Sandy Ryza of Dagster

Jan 3, 2024

Guest Sandy Ryza, an expert in machine learning pipelines, discusses the role of orchestrators in the lifecycle of data, changes in data ops and MLOps, data cleaning, and the overview of Dagster. They also explore the difference between data assets and tasks in data pipelines, defining lineage and data assets in Dagster, and the benefits of a unified orchestration framework. Additionally, they touch on orchestration in the development phase and the emergence of the analytics engineer role.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 4min

The Intersection of Data Ops and ML Ops

03:32 • 10min

DAGs and Data Assets in Data Pipelines

13:53 • 20min

Reasons to Use DAGSTER and DBT in Data Transformations

33:45 • 3min

Orchestrating ML and Data Processing Pipelines

36:31 • 17min

Enabling Collaboration and Breaking Technical Barriers in Machine Learning Pipelines

53:49 • 2min