The Real Python Podcast

Orchestrating Large and Small Projects With Apache Airflow

Jan 27, 2023
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Health System Data Integration

  • Calvin worked on a 3-year health data project involving acquiring multiple health systems with disparate systems.
  • They normalized and centralized diverse data sources like HR and medical records into a common format for analytics.
INSIGHT

ETL: Standardize Over Calculate

  • Data transformations mostly involved standardizing column names and data formats.
  • Complex calculations were rare; normalization focused on consistent formatting for analytics readiness.
ADVICE

Leverage Airflow for Orchestration

  • Use Airflow as an orchestrator to define and manage workflows with Python DAG files.
  • Place DAG Python files in Airflow's watch folder to auto-register and allow scheduling via UI.
Get the Snipd Podcast app to discover more snips from this episode
Get the app