
The Hedgineer Podcast DuckDB, Apache Arrow, & the Future of Data Engineering w/ Rusty Conover | S2E3
24 snips
Sep 9, 2025 Rusty Conover, a data engineering ace and prolific DuckDB extension creator, delves into the transformative power of DuckDB, emphasizing its speed and simplicity. He explains how the in-process architecture challenges traditional big data systems and explores the synergy with Apache Arrow. Rusty also shares insights on his 15 extensions, including Airport for data integration, and discusses the future of open table formats like Iceberg and Delta Lake. The conversation reveals DuckDB's potential to revolutionize analytics and replace complex ETL processes.
AI Snips
Chapters
Transcript
Episode notes
DuckDB As Engine And Protocol
- DuckDB is both a fast C++ execution engine and a composable protocol that unifies access to many data sources.
- Its plugin-based scanners let one query interface read CSV, Parquet, Postgres and more without separate federated engines.
Use Arrow Flight For Zero-Copy RPC
- Use Apache Arrow Flight for efficient zero-copy RPC of columnar data between services.
- Push and pull Arrow record batches so DuckDB can insert to or read from remote Arrow servers with minimal copies.
Transport Decoupled From Storage
- Arrow Flight servers can be memory-resident or persist to disk in any format you choose.
- That decouples transport from storage so servers can write to Iceberg, Delta Lake, CSV, etc.
