
The Real Python Podcast Decoupling Systems to Get Closer to the Data
33 snips
Apr 19, 2024 Phillip Cloud, the lead maintainer of Ibis, discusses the benefits of decoupled data processing systems, reusable queries, and accessing example data sets. He contrasts Ibis's workflow with other Python dataframe libraries, shares his journey with open source projects, and explores backend database selection for data analysis.
AI Snips
Chapters
Transcript
Episode notes
From Facebook SQL Pain To Ibis Contribution
- Phillip Cloud discovered Ibis while needing reusable SQL between Presto and Hive at Facebook and wanted one Python workflow for both.
- He joined after finding Wes McKinney's project and implemented the Postgres backend as his first major contribution.
First Open Source Win With Pandas
- Phillip's first major OSS contribution was to pandas for an FFT-based cross-correlation function while in grad school.
- That experience taught him to open issues, share code, and follow contributor guidance to land changes.
Compile Expressions, Avoid Intermediate Allocations
- Ibis builds expression trees and compiles them to SQL so engines can optimize whole queries instead of evaluating intermediate allocations.
- This reduces memory churn and can run far more efficiently than eager pandas operations.

