AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Choosing Arrow for InfluxDB Data Handling
The transition to Arrow as the persistence format for InfluxDB was driven by the need for efficient compression and compatibility with third-party systems. Additionally, Arrow was selected as the in-memory data structure to enable fast analytics and support high cardinality and time series data queries, unlike the slower analytics queries experienced with previous versions of InfluxDB. This choice of Arrow as the in-memory format paved the way for improved analytical capabilities and integration with various query engines, including a consideration of DuckDB.
Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode