Data Renegades

Ep. #4, Streaming Made Practical with Micah Wylde

10 snips
Dec 9, 2025
Micah Wylde, founder of Arroyo and former engineer at Cloudflare, shares his journey from building fraud detection at Sift Science to creating massive real-time systems at Lyft. He discusses the complexity of streaming systems, emphasizing that schema evolution remains a tough challenge. Micah argues for SQL-first streaming, critiques current CDC tooling, and highlights the importance of treating data outputs as products. He also predicts a shift in data architecture towards open formats and the impact of AI on data consumption.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Early Fraud Work Shaped His Data View

  • Micah Wylde started in data building fraud detection at Sift Science using custom in-house systems and HBase for low-latency features.
  • He learned deep lessons about data storage design and bespoke Java processing that shaped his approach to real-time systems.
ANECDOTE

Running Flink At Lyft Exposed Usability Gaps

  • At Lyft Micah ran Flink at tens of millions of events per second and tried to offer it as a self-serve platform to non-streaming experts.
  • He found only a handful of engineers could use Flink without heavy platform team involvement, sparking Arroyo's creation.
ADVICE

Treat Schema Evolution As The Core Problem

  • Prioritize schema and change management because any pipeline change is the hardest problem in data engineering.
  • Invest in testing, catalogs, and deployment practices to detect and evolve schemas safely.
Get the Snipd Podcast app to discover more snips from this episode
Get the app