Software Engineering Radio - the podcast for professional software developers

Episode 424: Sean Knapp on Dataflow Pipeline Automation

Sep 2, 2020
Ask episode
Chapters
Transcript
Episode notes
1
Introduction
00:00 • 3min
2
How Many Data Pipelines Do a Business Have?
02:33 • 4min
3
Are You Using Data Pipelines for Machine Learning Models?
06:50 • 2min
4
Data Pipelines and ETL - Is There Something in Between?
09:13 • 2min
5
CDC Has Changed Data Capture
11:31 • 3min
6
Is Dataflow the Most Critical Problem Today for Data Engineering?
14:11 • 3min
7
The SLO of a Data Pipeline
17:31 • 2min
8
Spark Spark Modeling - What Are Some of the Things That Can Go Wrong in a Data Pipeline?
19:03 • 2min
9
Is There a Risk of Cascading Failures?
20:46 • 2min
10
How to Model a Data Pipeline?
23:09 • 3min
11
Is the DAG of All the Steps?
26:35 • 2min
12
Is There a Job Description of the Data Pipeline Analyst?
28:21 • 3min
13
The Biggest Case for Automation?
31:11 • 3min
14
What Is the High Level Architecture of a Pipeline Automation Engine?
33:46 • 2min
15
Data Pipelines
35:38 • 2min
16
Generic Automation Engines - How Do They Interface With Query Languages?
37:59 • 3min
17
Is There a Way to Integrate With a Legacy System?
40:38 • 2min
18
Message Delivery - Is That the Right Kind of Guarantees?
42:20 • 2min
19
Is Automation a Good Idea?
44:46 • 3min
20
Monitoring Your Data Pipelines
47:48 • 4min
21
Is There a Need for Audit Trailing in Distributed Tracing?
51:32 • 2min
22
Is There a Need for More Advanced Scheduling and Orchestration?
53:26 • 5min