

Episode 424: Sean Knapp on Dataflow Pipeline Automation
Sep 2, 2020
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Introduction
00:00 • 3min
How Many Data Pipelines Do a Business Have?
02:33 • 4min
Are You Using Data Pipelines for Machine Learning Models?
06:50 • 2min
Data Pipelines and ETL - Is There Something in Between?
09:13 • 2min
CDC Has Changed Data Capture
11:31 • 3min
Is Dataflow the Most Critical Problem Today for Data Engineering?
14:11 • 3min
The SLO of a Data Pipeline
17:31 • 2min
Spark Spark Modeling - What Are Some of the Things That Can Go Wrong in a Data Pipeline?
19:03 • 2min
Is There a Risk of Cascading Failures?
20:46 • 2min
How to Model a Data Pipeline?
23:09 • 3min
Is the DAG of All the Steps?
26:35 • 2min
Is There a Job Description of the Data Pipeline Analyst?
28:21 • 3min
The Biggest Case for Automation?
31:11 • 3min
What Is the High Level Architecture of a Pipeline Automation Engine?
33:46 • 2min
Data Pipelines
35:38 • 2min
Generic Automation Engines - How Do They Interface With Query Languages?
37:59 • 3min
Is There a Way to Integrate With a Legacy System?
40:38 • 2min
Message Delivery - Is That the Right Kind of Guarantees?
42:20 • 2min
Is Automation a Good Idea?
44:46 • 3min
Monitoring Your Data Pipelines
47:48 • 4min
Is There a Need for Audit Trailing in Distributed Tracing?
51:32 • 2min
Is There a Need for More Advanced Scheduling and Orchestration?
53:26 • 5min