AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
DAGS-TUR offers a new approach to building and running data platforms by serving as an open-source, cloud-native orchestrator for data pipelines. It provides tools for the whole development lifecycle, including integrated lineage and observability, a declarative programming model, and robust testability. Users can quickly set up their team through DAGS-TUR Cloud, featuring enterprise-class hosted solutions with serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments.
InfluxDB, a time series database, was founded by Paul Dix, the founder and CTO of Influx Data. Originating from the founding of the FinTech startup in New York City, the need to handle time series data led to the development of InfluxDB. The database underwent significant evolution with versions 1.0 in 2016, 2.0 in 2019, and the recent version 3.0, which marked a substantial rewrite focusing on high cardinality data support and improved storage mechanisms.
Paul Dix's investment in the Apache ARRO ecosystem led to the creation of innovative database designs, including the FDAP stack. Flight SQL introduced flight as an in-memory columnar specification, Arrow serves as an umbrella project, and Data Fusion acts as a SQL processor. The stack seeks to provide an adaptable architecture for database engines, emphasizing performance and ecosystem compatibility.
Building database systems involves complex challenges, with projects often taking longer than expected. The adoption of ARRO and data fusion by tech giants for scaling needs underscores the platform's potential. Contribution and integration into the larger community projects remain crucial, highlighting the importance of open-source collaboration in driving technology advancements.
The future of data systems may witness a convergence of data warehousing and stream processing, bridging the gap for real-time analytics. Projects like Apache Iceberg integration, distributed processing advancements, and an open-source monolithic version of InfluxDB signaling exciting developments in data management. The evolving landscape underlines the role of innovative technology stacks in shaping next-gen data solutions.
Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode