The Real Python Podcast

Decoupling Systems to Get Closer to the Data

33 snips
Apr 19, 2024
Phillip Cloud, the lead maintainer of Ibis, discusses the benefits of decoupled data processing systems, reusable queries, and accessing example data sets. He contrasts Ibis's workflow with other Python dataframe libraries, shares his journey with open source projects, and explores backend database selection for data analysis.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

From Facebook SQL Pain To Ibis Contribution

  • Phillip Cloud discovered Ibis while needing reusable SQL between Presto and Hive at Facebook and wanted one Python workflow for both.
  • He joined after finding Wes McKinney's project and implemented the Postgres backend as his first major contribution.
ANECDOTE

First Open Source Win With Pandas

  • Phillip's first major OSS contribution was to pandas for an FFT-based cross-correlation function while in grad school.
  • That experience taught him to open issues, share code, and follow contributor guidance to land changes.
INSIGHT

Compile Expressions, Avoid Intermediate Allocations

  • Ibis builds expression trees and compiles them to SQL so engines can optimize whole queries instead of evaluating intermediate allocations.
  • This reduces memory churn and can run far more efficiently than eager pandas operations.
Get the Snipd Podcast app to discover more snips from this episode
Get the app