Super Data Science: ML & AI Podcast with Jon Krohn

815: Polars: Faster DataFrame Ops, with Marco Gorelli

14 snips
Sep 3, 2024
In this enlightening discussion, Marco Gorelli, a Senior Software Engineer at Quansight Labs and a core developer of the Polars and Narwhals libraries, shares his insights on optimizing data operations. He explains when to use Polars over Pandas and its unique features like lazy evaluation and string optimizations. Marco also delves into the Narwhals library, bridging compatibility with Pandas. He shares his strategies for winning forecasting competitions and addresses the need for greater diversity in data science. Prepare for a deep dive into the future of data manipulation!
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ADVICE

Polars vs. Pandas

  • Use Pandas for existing projects that work efficiently.
  • Consider Polars for new data science projects for better code and potential performance gains.
INSIGHT

Polars Lazy Evaluation

  • Polars expressions optimize data processing by delaying evaluation until needed.
  • This lazy evaluation enables query optimization and parallelization, yielding speed and memory improvements.
INSIGHT

String Optimization in Polars

  • Polars optimizes string operations using a specialized string storage different from Pandas and NumPy.
  • This leads to substantial performance gains, even in queries without explicit string operations.
Get the Snipd Podcast app to discover more snips from this episode
Get the app