
The Hedgineer Podcast DuckDB, Apache Arrow, & the Future of Data Engineering w/ Rusty Conover | S2E3
24 snips
Sep 9, 2025 Rusty Conover, a data engineering ace and prolific DuckDB extension creator, delves into the transformative power of DuckDB, emphasizing its speed and simplicity. He explains how the in-process architecture challenges traditional big data systems and explores the synergy with Apache Arrow. Rusty also shares insights on his 15 extensions, including Airport for data integration, and discusses the future of open table formats like Iceberg and Delta Lake. The conversation reveals DuckDB's potential to revolutionize analytics and replace complex ETL processes.
AI Snips
Chapters
Transcript
Episode notes
Use Data Sketches For Large-Scale Approximation
- Use data sketches for approximate analytics when exactness isn't required and memory is limited.
- Sketches give distributions, heavy hitters, and quantiles cheaply across billions of rows.
Arrow As A Columnar Interchange
- Arrow is an in-memory columnar record-batch model that can read/write Parquet, ORC, CSV and IPC.
- Arrow acts as a lingua franca between compute engines and avoids repeated copies and format conversions.
Port Queries Between Engines
- Different columnar engines (ClickHouse, DuckDB) have trade-offs; SQL translation tools like SQLGlot ease portability.
- Consider rewiring queries between engines when a different backend yields better performance or cost.
