Software Engineering Radio - the podcast for professional software developers cover image

Episode 424: Sean Knapp on Dataflow Pipeline Automation

Software Engineering Radio - the podcast for professional software developers

00:00

Spark Spark Modeling - What Are Some of the Things That Can Go Wrong in a Data Pipeline?

Pipelines tend to be far more brittle in nature for a handful of reasons. No nodes are fully isolated. You're running other Spark jobs on those nodes at the same time. The other thing you'll oftentimes see is there's bad data like all of a sudden a corrupted record comes through.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app