AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Importance of Distributed Query Processing in Data Fusion
Distributed query processing is crucial for the advancement of data fusion, which might soon become an independent Apache project. This enhancement will allow data fusion to compete effectively in the large-scale data warehousing sector. Additionally, advancements such as Parquet's geo capabilities are seen as significant, and the debate over columnar serialization format preferences between Parquet and ORC seems to have settled with a general consensus.
Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode