Modern OLAP Database System Design with FDAP (Andrew Lamb)

15 snips

Jun 5, 2024

Andrew Lamb, Staff Software Engineer at InfluxDB and chair of the Apache Data Fusion project, shares his expertise on modern OLAP database design. He explains the power of the FDAP stack, highlighting how Apache Parquet and Arrow enhance data storage and retrieval efficiency. The conversation delves into the challenges of data immutability and management, while also discussing Flight's role in simplifying data transfer. Looking ahead, Andrew envisions evolving trends in database technologies, paving the way for innovative solutions in analytics.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Analytics Workload Focus

Analytics workloads focus on throughput (rows per second) rather than individual transactions.
They involve large-scale aggregations, statistics computation, and data slicing for various consumers.

INSIGHT

Bottlenecks in Traditional Analytics

Traditional analytical systems face bottlenecks due to increasing data volumes and velocity.
FDAP aims to address these limitations by providing pre-built, optimized components.

ANECDOTE

Genesis of FDAP

Paul Dix, seeking to rebuild InfluxDB, wanted an analytics engine with columnar storage and other features.
He found that re-implementing these common components was expensive, leading to the creation of FDAP.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

In this video I speak with Andrew Lamb, Staff Software Engineer @Influxdb. We discuss FDAP (Flight, DataFusion, Arrow, Parquet) stack for modern OLAP database system design. Andrew shared some insights into why the FDAP stack is so powerful in designing and implementing a modern OLAP database. Chapters: 00:00 Introduction 01:48 Understanding Analytics: Transactional vs Analytical Databases 04:41 The Genesis and Goals of the FDAP Stack 09:31 Decoding FDAP: Flight, Data Fusion, Arrow, and Parquet 12:40 Apache Parquet: Revolutionizing Columnar Storage 17:18 Apache Arrow: The In-Memory Game Changer 23:51 Interoperability and Migration with Apache Arrow 27:10 Comparing Apache Parquet and Arrow 28:26 Exploring Data Mutability in Analytic Systems 29:19 Handling Data Updates and Deletions 29:24 The Role of Immutable Storage in Analytics 30:42 Optimizing Data Storage and Mutation Strategies 34:20 Introducing Flight: Simplifying Data Transfer 35:02 Deep Dive into Flight's Benefits and SQL Support 39:20 Unpacking Data Fusion's SQL Support and Extensibility 46:12 The Interplay of FDAP Components in Analytics 51:49 Future Directions and Innovations in Data Analytics 56:04 Concluding Thoughts on FDAP and Its Impact FDAP Stack: https://www.influxdata.com/glossary/fdap-stack/ FDAP Blog: https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/ InfluxDB: https://www.influxdata.com/ Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #datafusion #parquet #sql #OLAP #apachearrow #database #systemdesign