Data Engineering Podcast

Pachyderm with Daniel Whitenack - Episode 1

Jan 14, 2017
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Provenance Enables Accountability

  • Versioning data alongside analysis code enables tracking exact data provenance.
  • This capability is crucial for accountability, compliance, and collaborative reproducibility.
INSIGHT

Efficient Versioning and Performance

  • Pachyderm stores data commits as differential changes, making it space efficient.
  • Each pipeline runs on a specific data commit, avoiding historical data scrubbing and boosting performance.
INSIGHT

Language Agnostic Power

  • Pachyderm supports any language or framework by using container-based pipelines.
  • This flexibility bridges gaps between data engineers and scientists with different tool preferences.
Get the Snipd Podcast app to discover more snips from this episode
Get the app