Stefanie Molin, author of Hands-On Data Analysis with Pandas, shares insights on data wrangling in Pandas, advantages over other libraries, creating Python packages, and using Matplotlib or Seaborn for visualization. Discussions include the benefits of chaining operations in Pandas, where to start learning, and her experience as a software engineer at Bloomberg.
Creating Python packages fosters reusable and explainable code.
Deep dives
Using Pandas for Data Wrangling and Visualization
Pandas is highlighted as a valuable tool for data analysts, scientists, and machine learning engineers due to its versatility in data wrangling and visualization. Stephanie Molen, a software engineer at Bloomberg, shares practical tips in this episode on leveraging the Pandas library and Python for data analysis, as well as for data visualization. Her extensive experience and bestselling book on Hands-On Data Analysis with Pandas demonstrate her expertise in this field.
Transition to Matplotlib and Seaborn for Advanced Visualization
Matplotlib is recommended for more advanced control over plots, especially when customizing tick marks or specific plot elements. Seaborn, known for its aesthetic appeal and flexibility, becomes essential when working with long-format data or requiring more sophisticated color coding and visualization nuances. Stephanie's workshops and book emphasize the importance of transitioning to Matplotlib and Seaborn for specialized plotting needs.
Simulated Annealing and Data Morph Visualization
Stephanie's innovative Data Morph library showcases animations of data morphing between shapes while maintaining fixed means, standard deviations, and correlations. Through simulated annealing, this AI technique allows visual transformations to demonstrate statistical concepts effectively. The diverse shapes available in Data Morph offer insights into statistical effects and engage learners in understanding complex data transformations.
Financial Analysis Package Creation and Modular Code Sharing
The podcast episode discusses the creation of a financial analysis package, emphasizing the importance of sharing reusable modular code in the data analysis field. The package was motivated by the need to provide tools for calculating various finance metrics in an easily explainable manner. By structuring the package to showcase different concepts such as static classes and data initialization, users can gain insights into building effective open-source software solutions.
Open Source Contribution and Learning through Library Maintenance
The podcast delves into the significance of contributing to open-source libraries like pandas, scikit-learn, and NumPy. The guest highlights the rewarding experience of fixing issues within these libraries and emphasizes the importance of giving back to the community. By actively participating in open-source maintenance, individuals not only contribute positively to the community but also enhance their own expertise and understanding of library functionalities.
Wrangling data in Pandas, when to use Pandas, Matplotlib or Seaborn, and why you should learn to create Python packages: Jon Krohn speaks with guest Stefanie Molin, author of Hands-On Data Analysis with Pandas.
This episode is brought to you by Posit, the open-source data science company, and by AWS Inferentia. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode you will learn: • The advantages of using pandas over other libraries [07:55] • Why data wrangling in pandas is so helpful [12:05] • Stefanie’s Data Morph library [24:27] • When to use pandas, matplotlib, or seaborn [33:45] • Understanding the ticker module in matplotlib [36:48] • Where data analysts should start their learning journey [40:08] • What it’s like being a software engineer at Bloomberg [51:19]