What are the advantages of using Polars for your Python data projects? When should you use the lazy or eager APIs, and what are the benefits of each? This week on the show, we speak with Jeroen Janssens and Thijs Nieuwdorp about their new book, Python Polars: The Definitive Guide.
Jeroen and Thijs describe how they were introduced to Polars while working at Xomnia. They were converting a large data project to Python and saw surprising speed increases using the new library.
We discuss converting projects from pandas to Polars, getting away from indexes, consistent syntax, and using lazy vs eager APIs. Along the way, Jeroen and Thijs offer tips for getting the most out of Polars in your code.
We dig into the process of writing a definitive guide and the advantages of working collaboratively on a book project. They also share resources for practicing data wrangling and building visualizations with Pydy Tuesday.
Course Spotlight: Working With Python Polars
Welcome to the world of Polars, a powerful DataFrame library for Python. In this video course, you’ll get a hands-on introduction to Polars’ core features and see why this library is catching so much buzz.
Topics:
- 00:00:00 – Introduction
- 00:02:47 – Polars start at Xomnia
- 00:04:08 – Putting Polars into production
- 00:07:18 – Realizing the speed differences
- 00:08:49 – Converting the project from R to Python
- 00:14:34 – How did Polars improve the project?
- 00:16:34 – Making the code more ergonomic and readable
- 00:19:21 – Only grabbing the data that is needed
- 00:20:37 – Titling and deciding to write the book
- 00:24:40 – Advantages to collaboration
- 00:29:34 – What were you excited to include in the book?
- 00:31:55 – Working with different engines and Nvidia’s Cuda
- 00:35:05 – Defining a Polars expression
- 00:36:11 – Transitioning from pandas to Polars
- 00:37:34 – Not needing an index
- 00:39:56 – What inspired the syntax?
- 00:45:01 – Defining lazy vs eager workflows
- 00:49:16 – Examples covered in first chapter preview
- 00:51:51 – Video Course Spotlight
- 00:53:14 – Data formats and Arrow
- 00:55:41 – Working with NaN, null, or None
- 00:58:11 – Measuring performance through a benchmark
- 00:59:12 – Advantages to working with the Discord community
- 01:02:32 – Code examples and applying the techniques
- 01:03:34 – Pydy Tuesday
- 01:05:47 – What are you excited about in the world of Python?
- 01:09:21 – What do you want to learn next?
- 01:13:26 – What’s the best way to follow your work online?
- 01:14:14 – Thanks and goodbye
Survey:
Show Links:
Level up your Python skills with our expert-led courses:
Support the podcast & join our community of Pythonistas