

815: Polars: Faster DataFrame Ops, with Marco Gorelli
14 snips Sep 3, 2024
In this enlightening discussion, Marco Gorelli, a Senior Software Engineer at Quansight Labs and a core developer of the Polars and Narwhals libraries, shares his insights on optimizing data operations. He explains when to use Polars over Pandas and its unique features like lazy evaluation and string optimizations. Marco also delves into the Narwhals library, bridging compatibility with Pandas. He shares his strategies for winning forecasting competitions and addresses the need for greater diversity in data science. Prepare for a deep dive into the future of data manipulation!
AI Snips
Chapters
Books
Transcript
Episode notes
Polars vs. Pandas
- Use Pandas for existing projects that work efficiently.
- Consider Polars for new data science projects for better code and potential performance gains.
Polars Lazy Evaluation
- Polars expressions optimize data processing by delaying evaluation until needed.
- This lazy evaluation enables query optimization and parallelization, yielding speed and memory improvements.
String Optimization in Polars
- Polars optimizes string operations using a specialized string storage different from Pandas and NumPy.
- This leads to substantial performance gains, even in queries without explicit string operations.