Data Engineering Podcast

How Shopify Is Building Their Production Data Warehouse Using DBT

10 snips
Feb 9, 2021
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Wide Variety Of Source Data

  • Shopify ingests many sources: sharded MySQL, hundreds of third-party APIs, Kafka streams, and occasional dumps.
  • This diversity demands flexible, declarative tooling to unify processing and analysis.
ANECDOTE

Slow Iteration With PySpark Prototype Workflow

  • Starscream required converting SQL prototypes into PySpark, which slowed iteration from prototype to production by weeks.
  • That pain motivated adopting a SQL-first workflow to let analysts stay in SQL and iterate faster.
ADVICE

Use Open, Declarative Tooling

  • Prefer existing, open tools rather than reimplementing orchestration unless you must.
  • Choose declarative, SQL-first systems so users focus on modeling, not platform internals.
Get the Snipd Podcast app to discover more snips from this episode
Get the app