Tech on the Rocks cover image

Tech on the Rocks

How Denormalized is Building ‘DuckDB for Streaming’ with Apache DataFusion

Sep 13, 2024
Amey Chaugule and Matt Green, co-founders of Denormalized, share their extensive engineering backgrounds from top tech firms. They discuss the creation of an embedded stream processing engine designed to simplify real-time data workloads. The duo tackles challenges in existing systems like Spark and Kafka, emphasizing developer experience and state management. They also compare DuckDB and SQLite in the context of streaming data, highlighting the future of user-friendly data tools and the importance of fault tolerance in modern applications.
01:02:01

Podcast summary created with Snipd AI

Quick takeaways

  • Denormalized is developing an embedded stream processing engine that simplifies real-time data workloads by leveraging Apache DataFusion's single-node capabilities.
  • The challenges of achieving fault tolerance in streaming systems often lead practitioners to skip necessary checkpointing, raising critical concerns about continuous data processing.

Deep dives

Evolution of Streaming Systems

Streaming systems have undergone significant evolution, with various frameworks emerging to handle stream processing workloads. Experts like Ami and Matt from Denormalized have worked on multiple platforms, notably Uber, where they dealt with massive Kafka deployments. Their experiences revealed the complexities involved in real-time data processing and the challenges in achieving fault tolerance. They explored how traditional assumptions about streaming systems, particularly around fault tolerance, might not align with the actual practices in engineering today.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner