Christopher Trudeau, a data analysis expert, returns alongside Joshua Cook, an author experienced in Python project organization. They discuss best practices for structuring data analyses with a focus on modern packaging techniques. Trudeau shares insights from his new video course on using pandas GroupBy for effective data manipulation and aggregation. The duo also touches on recent Python community resources, including advancements in package management and strategies for effective data practices.
55:22
forum Ask episode
web_stories AI Snips
view_agenda Chapters
auto_awesome Transcript
info_circle Episode notes
question_answer ANECDOTE
Setuptools Test Command Removal
Setuptools briefly removed the "test" command, breaking many builds.
Maintainers quickly restored it after community outcry, highlighting packaging challenges.
insights INSIGHT
Packaging Data Analyses
Data science projects benefit from treating analyses like packages.
This approach improves organization and portability, especially for complex projects.
volunteer_activism ADVICE
Project Best Practices
Create a module for all paths in your project for better organization.
Use enums for consistency and type hints for clarity, reducing repetition.
Get the Snipd Podcast app to discover more snips from this episode
What are the best practices for organizing data analysis projects in Python? What are the advantages of a more package-centric approach to data science? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder’s Weekly articles and projects.
We discuss Joshua Cook’s recent article “How I Use Python to Organize My Data Analyses.” The article covers how his process for building data analysis projects has evolved and now incorporates modern Python packaging techniques.
Christopher shares his recent video course on grouping real-world data with pandas. The course offers a quick refresher before digging into how to use pandas GroupBy to manipulate, transform, and summarize data.
We also share several other articles and projects from the Python community, including a news roundup, working with JSON data in Python, running an Asyncio event loop in a separate thread, knowing the why behind a system’s code, a retro game engine for Python, and a project for vendorizing packages from PyPI.
In this course, you’ll learn how to work adeptly with the pandas GroupBy while mastering ways to manipulate, transform, and summarize data. You’ll work with real-world datasets and chain GroupBy methods together to get data into an output that suits your needs.
Topics:
00:00:00 – Introduction
00:02:18 – Setuptools Breaks Things, Then Fixes Them
00:04:57 – PEP 751: A File Format to List Python Dependencies
00:07:04 – Python 3.13.0 Release Candidate 1 Released
00:07:15 – Python Insider: Python 3.12.5 released
00:07:22 – Django 5.1 released - Django Weblog
00:07:27 – Django security releases issued: 5.0.8 and 4.2.15
00:07:49 – How I Use Python to Organize My Data Analyses
00:13:45 – Sponsor: Mailtrap
00:14:21 – pandas GroupBy: Grouping Real World Data in Python
00:20:33 – Working With JSON Data in Python
00:25:01 – Asyncio Event Loop in Separate Thread
00:30:33 – Video Course Spotlight
00:31:47 – Habits of great software engineers
00:49:17 – pyxel: A Retro Game Engine for Python
00:52:36 – python-vendorize: Vendorize Packages From PyPI
00:54:18 – Thanks and goodbye
News:
Setuptools Breaks Things, Then Fixes Them – This post is Bite Code’s monthly summary, but the lead story happened just days ago. In line with a 7 year old deprecation, setuptools finally removed the ability to call its test command. Many packages promptly broke. The following day the change was undone.
How I Use Python to Organize My Data Analyses – This is a description of how Joshua uses Python in a package-centric way to organize his approach to data analyses. This is a system he has evolved while working on his computational biology Ph.D. and working in industry.
pandas GroupBy: Grouping Real World Data in Python – In this course, you’ll learn how to work adeptly with the pandas GroupBy while mastering ways to manipulate, transform, and summarize data. You’ll work with real-world datasets and chain GroupBy methods together to get data into an output that suits your needs.
Working With JSON Data in Python – In this tutorial, you’ll learn how to read and write JSON-encoded data in Python. You’ll begin with practical examples that show how to use Python’s built-in “json” module and then move on to learn how to serialize and deserialize custom data.
Asyncio Event Loop in Separate Thread – Typically, the asyncio event loop runs in the main thread, but as that is the one used by the interpreter, sometimes you want the event loop to run in a separate thread. This article talks about why and how to do just that.