

The Real Python Podcast
Real Python
A weekly Python podcast hosted by Christopher Bailey with interviews, coding tips, and conversation with guests from the Python community.
The show covers a wide range of topics including Python programming best practices, career tips, and related software development topics. Join us every Friday morning to hear what's new in the world of Python programming and become a more effective Pythonista.
The show covers a wide range of topics including Python programming best practices, career tips, and related software development topics. Join us every Friday morning to hear what's new in the world of Python programming and become a more effective Pythonista.
Episodes
Mentioned books

Jul 10, 2020 • 50min
Linear Programming, PySimpleGUI, and More
Are you familiar with linear programming, and how it can be used to solve resource optimization problems? Would you like to free your Python code from a clunky command line and start making convenient graphical interfaces for your users? This week on the show, David Amos is back with another batch of PyCoder’s Weekly articles and projects.
David talks about a recent Real Python article about linear programming in Python. We discuss an article titled “PySimpleGUI: The Simple Way to Create a GUI With Python.” We also cover several other articles and projects from the Python community including: Python’s reduce() function, flaws in the pickle module, advanced pytest techniques, and how to trick a neural network.
Course Spotlight: Parallel Iteration With Python’s zip() Function
This course will get you up to speed with Python’s zip() function. In this course, you’ll discover the logic behind zip() and how you can use it to consistently solve common programming problems, like creating dictionaries.
Topics:
00:00:00 – Introduction
00:01:34 – Python’s reduce(): From Functional to Pythonic Style
00:07:46 – Hands-On Linear Programming: Optimization With Python
00:15:07 – Pickle’s Nine Flaws
00:22:31 – Video Course Spotlight
00:23:33 – Advanced pytest Techniques I Learned While Contributing to pandas
00:33:41 – PySimpleGUI: The Simple Way to Create a GUI With Python
00:38:20 – How to Trick a Neural Network in Python 3
00:43:31 – TextAttack: A Python Framework for Adversarial Attacks, Data Augmentation, and Model Training in NLP
00:46:09 – byob: BYOB (Build Your Own Botnet)
00:49:09 – Thanks and Goodbye
Show Links:
Python’s reduce(): From Functional to Pythonic Style – In this step-by-step tutorial, you’ll learn how Python’s reduce() works and how to use it effectively in your programs. You’ll also learn some more modern, efficient, and Pythonic ways to gently replace reduce() in your programs.
Hands-On Linear Programming: Optimization With Python – In this tutorial, you’ll learn about implementing optimization in Python with linear programming libraries. Linear programming is one of the fundamental mathematical optimization techniques. You’ll use SciPy and PuLP to solve linear programming problems.
Pickle’s Nine Flaws – “Python’s pickle module is a very convenient way to serialize and de-serialize objects. It needs no schema, and can handle arbitrary Python objects. But it has problems. This post briefly explains the problems.”
Advanced pytest Techniques I Learned While Contributing to pandas – Contributing to open-source projects is a great way to learn new techniques and level up your skills. Martin Winkel shares five advanced pytest techniques he learned while contributing to the pandas project.
PySimpleGUI: The Simple Way to Create a GUI With Python – In this step-by-step tutorial, you’ll learn how to create a cross-platform graphical user interface (GUI) using Python and PySimpleGUI. A graphical user interface is an application that has buttons, windows, and lots of other elements that the user can use to interact with your application.
How to Trick a Neural Network in Python 3 – Is that a corgi or a goldfish?
Projects:
TextAttack: A Python Framework for Adversarial Attacks, Data Augmentation, and Model Training in NLP
byob: Build Your Own Botnet
Additional Links:
PyCoder’s Weekly
Functional Programming in Python
Linear Programming: Wikipedia article
The Python pickle Module: How to Persist Objects in Python
Marshmallow
Python REST APIs With Flask, Connexion, and SQLAlchemy – Part 2
Effective Python Testing With Pytest
Getting Started With Testing in Python
Practical Text Classification With Python and Keras
PySimpleGUI
Level up your Python skills with our expert-led courses:
Supercharge Your Classes With Python super()
Functional Programming in Python
Parallel Iteration With Python's zip() Function
Support the podcast & join our community of Pythonistas

Jul 3, 2020 • 1h 2min
Thinking in Pandas: Python Data Analysis the Right Way
Are you using the Python library Pandas the right way? Do you wonder about getting better performance, or how to optimize your data for analysis? What does normalization mean? This week on the show we have Hannah Stepanek to discuss her new book “Thinking in Pandas”.
The inspiration behind Hannah’s book came out of her talk at PyCon US 2019 titled “Thinking Like a Panda: Everything You Need to Know to Use Pandas the Right Way.” We discuss several core concepts covered in the book. She shares techniques for getting more performance when working with your data in Pandas. We also talk about her recent PyCon US 2020 online presentation about databases and migration.
Course Spotlight: Finding the Perfect Python Code Editor
Find your perfect Python development setup with this review of Python IDEs and code editors. With this course you’ll get an overview of the most common Python coding environments to help you make an informed decision.
Topics:
00:00:00 – Introduction
00:01:36 – Working for New Relic
00:03:14 – Thinking in Pandas book release
00:03:27 – Who is the intended reader?
00:05:27 – What is the underlying tech for Pandas?
00:09:04 – Why you shouldn’t use apply?
00:13:00 – When you have to use apply
00:16:06 – Normalizing your data
00:17:05 – Do you have a preferred format for a dataframe?
00:18:17 – More on multi-index dataframes
00:24:50 – Creating NumPy types
00:28:30 – Loading in your data
00:30:33 – Video Course Spotlight
00:31:41 – Pivoting data
00:34:34 – Considering outside libraries and performance
00:35:41 – What topic were you eager to share in the book?
00:37:52 – What resources did you use to learn pandas?
00:40:53 – PyCon 2020 talk about databases and migration
00:45:34 – Delving into migration and Alembic
00:53:15 – Speaking opportunities
00:56:13 – What are you excited about in the world of Python?
00:57:32 – What do you want to learn next?
00:58:49 – Do you read source code to learn?
01:00:16 – Is there a particularly well-written library?
01:01:28 – Final Thanks
Links:
Thinking in Pandas: How to Use the Python Data Analysis Library the Right Way - Apress
Thinking like a Panda: Everything you need to know to use pandas the right way - PyCon 2019 - Hannah Stepanek
pandas
CPython Internals: Your Guide to the Python 3 Interpreter
MultiIndex / advanced indexing: pandas documentation
NumPy Data type objects (dtype)
pandas.DataFrame.pivot: pandas documentation
Let’s talk Databases in Python: SQLAlchemy and Alembic - PyCon 2020 - Hannah Stepanek
SQLAlchemy: The Python SQL Toolkit and Object Relational Mapper
Alembic: A database migration tool for SQLAlchemy
import asyncio: Learn Python’s AsyncIO #1 - The Async Ecosystem
Level up your Python skills with our expert-led courses:
Finding the Perfect Python Code Editor
Histogram Plotting in Python: NumPy, Matplotlib, Pandas & Seaborn
Idiomatic pandas: Tricks & Features You May Not Know
Support the podcast & join our community of Pythonistas

Jun 26, 2020 • 45min
Python Regular Expressions, Views vs Copies in Pandas, and More
This podcast covers a range of interesting topics including regular expressions in Python, views vs copies in Pandas, and methods for flattening a list in Python. They also discuss combining Flask and Vue, machine learning production, space science with Python, and a video course on reading and writing files in Python.

Jun 19, 2020 • 55min
Going Serverless with Python
Learn about the advantages of serverless computing with Python in the cloud and how it is suitable for data science, machine learning, and API creation. Discover how to use Azure Functions and VS Code for serverless development. Explore topics such as blob storage integration, working with service principles, and setting up and running serverless functions locally. The podcast also discusses real-time functionality with Flask Socket.IO and delves into the fascination with threading in Python.

Jun 12, 2020 • 45min
PDFs in Python and Projects on the Raspberry Pi
Have you wanted to work with PDF files in Python? Maybe you want to extract text, merge and concatenate files, or even create PDFs from scratch. Are you interested in building hardware projects using a Raspberry Pi? This week on the show we have David Amos from the Real Python team to discuss his recent article on working with PDFs. David also brings a few other articles from the wider Python community for us to discuss.
David searches for the latest Python news, links, and articles to produce PyCoder’s Weekly with Dan Bader. PyCoder’s Weekly is a free email newsletter for those interested in Python development. Along with David’s article on PDFs, we discuss another recent Real Python article about building physical projects with the Raspberry Pi. We also discuss articles from the community about: the PEPs of Python 3.9, why you should stop using datetime.now, Python dependency tools, and several ways to pass code to Python from the terminal.
Course Spotlight: Cool New Features in Python 3.8
This course will get you up to speed with the new features of the latest release of Python. You’ll learn about using assignment expressions, how to enforce postional-only arguments, more precise type hints, and using f-strings for simpler debugging. It’s a worthy investment of your time to understand what the most recent release of Python provides before moving on to the next version this fall.
Topics:
00:00:00 – Introduction
00:02:06 – Ways to Pass Code to Python From the Terminal
00:05:54 – The PEPs of Python 3.9
00:10:54 – Creating and Modifying PDF Files in Python
00:18:51 – Video Course Spotlight
00:19:56 – An Overview of Python Dependency Tools
00:26:55 – Stop Using datetime.now
00:31:44 – Build Physical Projects With Python on the Raspberry Pi
00:38:18 – What are you excited about in the world of Python?
00:42:29 – What do you want to learn next in Python?
00:44:31 – Thanks and Good Bye
Topic Links:
PyCoder’s Weekly
The Many Ways to Pass Code to Python From the Terminal – You might know about pointing Python to a file path, or using -m to execute a module. But did you know that Python can execute a directory? Or a .zip file?
The PEPs of Python 3.9 – The first Python 3.9 beta release is upon us! Learn what to expect in the final October release by taking a tour of the Python Enhancement Proposals (PEPs) that were accepted for Python 3.9.
Creating and Modifying PDF Files in Python – Explore the different ways of creating and modifying PDF files in Python. You’ll learn how to read and extract text, merge and concatenate files, crop and rotate pages, encrypt and decrypt files, and even create PDFs from scratch.
Overview of Python Dependency Management Tools – While pip is often considered the de facto Python package manager, the dependency management ecosystem has really grown over that last few years. Learn about the different tools available and how they fit into this ecosystem.
Stop Using datetime.now! (With Dependency Injection) – How do you test a function that relies on datetime.now() or date.today()? You could use libraries like FreezeGun or libfaketime, but not every project can afford the luxury of reaching for third-party solutions. Learn how dependency injection can help you write code that is more testable, maintainable, and practical.
Build Physical Projects With Python on the Raspberry Pi – In this tutorial, you’ll learn to use Python on the Raspberry Pi. The Raspberry Pi is one of the leading physical computing boards on the market and a great way to get started using Python to interact with the physical world.
Additional Links:
Python Basics: A Practical Introduction to Python 3
PEG Parsers -Guido van Rossum - Medium article
Code with Mu: a simple Python editor for beginner programmers
SSH (Secure Shell)
Visual Studio Code
VSCode - Remote Development using SSH
VIM and Python – A Match Made in Heaven - Real Python article
How to Build a Python GUI Application With wxPython - Real Python article
import asyncio: Learn Python’s AsyncIO #1 - The Async Ecosystem
python-rtmidi - A Python binding for the RtMidi C++ library
Level up your Python skills with our expert-led courses:
Arduino With Python: Getting Started
Finding the Perfect Python Code Editor
Cool New Features in Python 3.8
Support the podcast & join our community of Pythonistas

5 snips
Jun 5, 2020 • 50min
Web Scraping in Python: Tools, Techniques, and Legality
Do you want to get started with web scraping using Python? Are you concerned about the potential legal implications? What are the tools required and what are some of the best practices? This week on the show we have Kimberly Fessel to discuss her excellent tutorial created for PyCon 2020 online titled “It’s Officially Legal so Let’s Scrape the Web.”
We discuss getting started with web scraping, and cover tools and techniques. Kimberly gives advice on finding elements inside of the html, and techniques for cleaning your data. She also notes a recent change to the legal landscape regarding scraping the web.
Kimberly is a Senior Data Scientist at Metis Data Science Bootcamp in New York City. She holds a Ph.D. in applied mathematics. We talk about her switch from academia to data science, and discuss her passion for data storytelling and visualizations.
Course Spotlight: Defining Main Functions in Python
This course will get you up to speed with defining a starting point for the execution of a program, and helps you to understand what goes into the main() function. Prepare for a deep dive as you go through the sections. It’s a worthy investment of your time to understand this vital entry point for your Python scripts and applications!
Topics:
00:00:00 – Introduction
00:01:31 – Kimberly’s background and Metis Data Science Bootcamp
00:02:19 – NLP and work in advertising
00:03:27 – Changes in the legality of web scraping
00:06:12 – What are good projects for web scraping?
00:06:56 – Tools to start web scraping
00:07:51 – How to find the elements you want?
00:09:00 – How much HTML should you know?
00:10:49 – Inspecting elements in the browser
00:14:30 – What are good sites to practice on?
00:16:20 – Pausing between requests
00:19:02 – Saving as you go
00:20:54 – Real Python Video Course Spotlight
00:21:55 – Navigating the DOM
00:23:10 – Data cleaning and formatting
00:28:26 – Dynamic sites and Selenium
00:32:16 – Scrapy
00:33:55 – PyOhio 2020
00:35:40 – Transition out of academia
00:38:40 – What are you excited about in the world of Python?
00:41:05 – What do you want to learn next in Python?
00:48:00 – What is a less known Python tip or trick?
00:49:17 – Thanks and Goodbye
Show Links:
Kimberly Fessel, PHD - Blog
Metis: Data Science Training
It’s Officially Legal so Let’s Scrape the Web: PyCon 2020 online - Tutorial
Victory! Ruling in hiQ v. Linkedin Protects Scraping of Public Data: EFF.org
Computer Fraud and Abuse Act - Wikipedia Article
Box Office Mojo
Sports Reference | Sports Stats, fast, easy, and up-to-date
Springfield! Springfield! - TV & Movie Scripts - Archive.org
Jupyter Notebook: An Introduction - Real Python Article
The Python pickle Module: How to Persist Objects in Python - Real Python Article
A Practical Introduction to Web Scraping in Python - Real Python Article
Beautiful Soup: Build a Web Scraper With Python - Real Python Article
Making HTTP Requests With Python - Real Python Video Course
Natural Language Processing With spaCy in Python - Real Python Article
Delorean: Time Travel Made Easy
Maya: Datetimes for Humans
Regular Expressions: Regexes in Python (Part 1) - Real Python Article
Selenium: Automates browsers. That’s it!
Scrapy: Framework for extracting the data you need from websites
PyOhio 2020
ODSC: Open Data Science Conference
Slides from Kimberly’s talk - Level Up: Fancy NLP with Straightforward Tools
Tonks: A general purpose deep learning library
Tonks: Building One (Multi-Task) Model to Rule Them All! - Medium Article
Plotly | Dash
geoplotlib: Python toolbox for visualizing geographical data and making map
GeoPandas: Make working with geospatial data in Python easier
Altair: Declarative Visualization in Python
Understanding the Transform Function in Pandas: Practical Business Python
JavaScript charting detour:
Down and Up: A Puzzle Illustrated with D3.js - Kimberly’s blog
d3js - Data-Driven Documents
Crossfilter: Fast Multidimensional Filtering for Coordinated Views
dc.js - Dimensional Charting JavaScript Library
Level up your Python skills with our expert-led courses:
Defining Main Functions in Python
Making HTTP Requests With Python
Strings and Character Data in Python
Support the podcast & join our community of Pythonistas

May 29, 2020 • 58min
Advice on Getting Started With Testing in Python
Have you wanted to get started with testing in Python? Maybe you feel a little nervous about diving in deeper than just confirming your code runs. What are the tools needed and what would be the next steps to level up your Python testing? This week on the show we have Anthony Shaw to discuss his article on this subject. Anthony is a member of the Real Python team and has written several articles for the site.
We discuss getting started with built-in Python features for testing and the advantages of a tool like pytest. Anthony talks about his plug-ins for pytest, and we touch on the next level of testing involving continuous integration.
Anthony recently finished a talk for PyCon 2020 Online, titled “Why is Python Slow?” He had the idea for the talk while he was working on his upcoming book about the CPython source code.
I also want to give an update on last weeks episode with Kyle Stratis, where we discussed Kyle being let go from his job due to the pandemic. Here’s some good news, Kyle will be joining a Boston startup called Vizit, as a senior data engineer. Congratulations Kyle!
Course Spotlight: The Python print() Function: Go Beyond the Basics
This course will get you up to speed with using Python print() effectively. Prepare for a deep dive as you go through the sections. You may be surprised how much print() has to offer!
Topics:
00:00:00 – Introduction
00:01:46 – PyCon 2020 Online Talk - Why is Python slow?
00:04:05 – CPython Internals Book
00:07:08 – Attending Conferences
00:09:01 – Getting Started with Testing in Python
00:12:32 – Unittest
00:17:16 – What does a tool like pytest add?
00:19:53 – pytest plugins
00:21:03 – Anthony’s pytest plugins
00:21:58 – What does coverage mean?
00:25:23 – Test runners
00:27:12 – Testing environments with Tox
00:30:50 – Real Python Video Course Spotlight
00:31:49 – More on continuous integration (CI)
00:37:21 – Recent changes to GitHub
00:38:21 – PSF to move issue tracker to GitHub
00:41:01 – DRY (Don’t Repeat Yourself)
00:43:46 – Benefits of linters and code formatting
00:48:00 – What is a little known part of Python?
00:52:16 – What are you excited about in the world of Python?
00:56:06 – What is something you thought you knew about Python, but were wrong about it?
00:57:27 – Goodbye and thanks
Show links:
Why is Python slow?: PyCon 2020 Online Talk
Your Guide to the CPython Source Code: Real Python article
TalkPython Podcast Episode #265: Why is Python slow?
Getting Started With Testing in Python: Real Python article
pytest: helps you write better programs
pytest-azurepipelines: Plugin for pytest that makes it simple to work with Azure Pipelines
Effective Python Testing With Pytest
tox automation project: Command line driven CI frontend
GitHub Actions: Automate your workflow from idea to production
Continuous Integration With Python: An Introduction: Real Python article
Brian K Okken - Multiply your Testing Effectiveness with Parameterized Testing: PyCon 2020 Online Talk
Python Testing with pytest: Brian Okken - The Pragmatic Bookshelf
Test & Code: Python Testing for Software Engineering: Podcast
Python’s migration to GitHub
Refactoring Python Applications for Simplicity: Real Python article
Black: The uncompromising code formatter
Wily: A command-line application for tracking, reporting on complexity of Python tests and applications
PEP 554 – Multiple Interpreters in the Stdlib
Python Insider: Python core development news and information
Level up your Python skills with our expert-led courses:
Continuous Integration With Python
Test-Driven Development With pytest
The Python print() Function: Go Beyond the Basics
Support the podcast & join our community of Pythonistas

May 22, 2020 • 1h 20min
Python Job Hunting in a Pandemic
Do you know someone in the Python community who recently was let go from their job due to the pandemic? What does the job landscape currently look like? What are skills and techniques that will help you in your job search? This week we have Kyle Stratis on the show to discuss how he is managing his job search after just being let go from his data engineering job. Kyle is a member of the Real Python team and has written several articles for the site.
We discuss Kyle’s career and the skills that he’s developed, which are currently helping him in his job search. Kyle left academia to work as a data engineer. His background helps him to communicate between teams of scientists and engineers.
We also talk about Kyle’s recent article on combining data in Pandas. Kyle shares a tip on Pandas efficiency, and hints at some lesser known features of Python generators.
Topics:
00:00:00 – Introduction
00:01:27 – Kyle’s background on being let go
00:04:17 – Programming background and building connections
00:10:18 – Becoming a Data Engineer
00:15:59 – Translating between science and data teams
00:20:35 – Every job has different language requirements
00:23:44 – Getting out of your Python language comfort zone
00:27:08 – NASDANQ project - a stockmarket for Memes
00:30:34 – Learning the power of building a network
00:35:13 – Using skills developed in outside projects
00:38:45 – What does the job landscape look like currently?
00:49:52 – Writing for Real Python
00:52:53 – Combining data in Pandas article
00:55:22 – Merging in Pandas
01:03:05 – Feedback and community
01:10:37 – What are you excited ab out in the world of Python?
01:12:12 – What is something you thought you knew about Python but were wrong about it?
01:14:01 – What is a little known Python trick or tip?
01:14:33 – More efficient Pandas
01:15:52 – Using more of the advanced features of generators
01:18:55 – Thanks and Goodbye
Show Links:
Kyle’s Blog
Kyle’s LinkedIn
A MongoDB Optimization: Kyle Stratis’ Blog
Memes are serious business with their own stock exchange: CNET
How a group of Redditors is creating a fake stock market to figure out the value of memes: The Verge
The joke Meme Economy is a now real thing called NASDANQ: AV Club
Forbes Did A V. Serious Analysis Of NASDANQ, The Stock Market For Memes: Pedestrian
Domi Station in Tallahassee
Combining Data in Pandas With merge(), .join(), and concat(): Real Python article
A Visual Explanation of SQL Joins: Coding Horror
Wily: A command-line application for tracking, reporting on complexity of Python tests and applications
Refactoring Python Applications for Simplicity: Real Python article
Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects: Real Python article
How to Use Generators and yield in Python: Real Python article
Level up your Python skills with our expert-led courses:
Python Coding Interviews: Tips & Best Practices
Sorting Data With Python
Idiomatic pandas: Tricks & Features You May Not Know
Support the podcast & join our community of Pythonistas

May 15, 2020 • 1h 16min
Leveling Up Your Python Literacy and Finding Python Projects to Study
In your quest to become a better developer, how do you find Python code that is at your reading level? What are good code bases or projects to study? What are the things holding you back from leveling up your Python literacy? This week we have Cecil Phillip on the show to discuss all of these common questions. Cecil is a Senior Cloud Advocate at Microsoft.
Cecil has been learning Python in the open on Twitch with Brian Clark. They run a weekly event on Twitch, where they are live-streaming an interactive Python course. Cecil has a background in multiple languages and technologies, and now he’s learning Python, bringing an audience along the way!
We start things off with a listener question and jump into a conversation about building up your Python skills. Then we’ll discuss common Python language stumbling blocks. Next we consider the importance of making personal projects, and documenting that code.
We also touch on some unique skills employers are looking for. And we discuss working through impostor syndrome. Cecil talks about his podcast “Away from the Keyboard” and his plans to start it back up.
In the show notes this week you’ll find links to resources we discuss, and several more that we didn’t have time to cover individually.
Want your question featured on the show? Send us your question at realpython.com/podcast-question and we might feature it on a future episode of the show.
Topics:
00:00:00 – Intro
00:01:52 – Cecil’s role at Microsoft
00:03:35 – Twitch Stream with Brian Clark
00:05:07 – Learning in front of an audience
00:13:05 – Listener’s question
00:14:46 – Finding code that’s at your level
00:20:31 – Understanding more complex syntax in Python
00:23:40 – Breaking down complexity
00:29:17 – Translation of code
00:31:55 – Importance of making projects and comments
00:36:28 – Finding community
00:41:23 – Open source contributing
00:42:25 – Dealing with impostor syndrome
00:49:09 – Looking for that first position
01:00:58 – More project resources in show notes
01:02:55 – Cecil’s podcast - Away from the keyboard
01:08:29 – What are you excited about in the world of Python?
01:10:14 – What is something you thought you knew about Python but were wrong about it?
01:12:01 – What’s the next thing you want to learn in Python?
01:13:37 – Read the actual Python docs
01:15:24 – Thanks and goodbye
Show links:
Microsoft Developer Channel
Cecil Phillip’s Twitter
Cecil’s Github
Microsoft Developer Twitch
Official Microsoft Python Discord
Away from the Keyboard: Podcast
Python Decorators 101: Real Python video course
Python Type Checking: Real Python video course
13 Project Ideas for Intermediate Python Developers: Real Python article
Suggested project reading list:
Flask: The Python micro framework for building web applications.
Django: The Web framework for perfectionists with deadlines
Howdoi: instant coding answers via the command line
Curio: A coroutine-based library for concurrent Python systems programming
scikit-learn: machine learning in Python
SQLAlchemy: The Database Toolkit for Python
Requests: A simple, yet elegant HTTP library
Markupsafe: Safely add untrusted strings to HTML/XML markup
Ask HN: Good Python codebases to read?
The Hitchhiker’s Guide to Python: Reading Great Code
Welcome! This is the documentation for Python 3.8
Level up your Python skills with our expert-led courses:
Intro to Object-Oriented Programming (OOP) in Python
Python Decorators 101
Python Type Checking
Support the podcast & join our community of Pythonistas

4 snips
May 8, 2020 • 56min
Docker + Python for Data Science and Machine Learning
Docker is a common tool for Python developers creating and deploying applications, but what do you need to know if you want to use Docker for data science and machine learning? What are the best practices if you want to start using containers for your scientific projects? This week we have Tania Allard on the show. She is a Sr. Developer Advocate at Microsoft focusing on Machine Learning, scientific computing, research and open source.
Tania has created a talk for the PyCon US 2020 which is now online. The talk is titled “Docker and Python: Making them Play Nicely and Securely for Data Science and ML.” Her talk draws on her expertise in the improvement of processes, reproducibility and transparency in research and data science. We discuss a variety of tools for making your containers more secure and results reproducible.
Tania is passionate about mentoring, open-source, and its community. She is an organizer for Mentored Sprints for Diverse Beginners, and she talks about the upcoming online sprints for PyCon US 2020. We also discuss her plans to start a podcast.
Topics:
00:00:00 – Introduction
00:01:43 – Microsoft Senior Developer Advocate Role
00:04:07 – PyCon 2020 Talk - Docker and Python: making them play nicely
00:05:34 – What is Docker?
00:10:08 – Reproducibility of project results
00:12:03 – What are the challenges of using Docker for machine learning?
00:15:06 – Getting started suggestions
00:16:26 – What metadata should be included?
00:17:48 – Creating images through stages
00:21:16 – What about your data?
00:22:40 – Kubernetes: Orchestrating containers
00:24:37 – Continuing stages into testing
00:25:37 – What are tools for testing security?
00:27:07 – Challenges in using containers for ML
00:28:52 – What types of databases?
00:29:39 – Are you doing initial research on a local machine?
00:30:59 – An example of a recent ML project
00:32:16 – Papermill: parameterizing and executing notebooks
00:33:16 – NLP: Natural Language Processing
00:33:58 – Kaggle: Help us better understand COVID-19
00:34:42 – What are other best practices for data intensive projects?
00:39:13 – Resources to get started in machine learning?
00:40:30 – Mentored Sprints for Diverse Beginners
00:45:34 – Tania’s upcoming podcast
00:48:38 – A visiting fellow at the Alan Turing Institute
00:49:08 – Weight lifting
00:50:16 – Craft beer
00:52:09 – What is something you thought you knew in Python but were wrong about?
00:53:50 – What are excited about in the world of Python?
00:54:42 – Thank you and Goodbye
Show links:
Tania Allard: Personal site
Docker and Python: making them play nicely and securely for Data Science and ML - Tania Allard
Slides for Docker and Python Talk
Docker
XKCD: Python Superfund Site
Best practices for writing Dockerfiles
Run Python Versions in Docker: How to Try the Latest Python Release
Kubernetes: Production-Grade Container Orchestration
Snyk: Securing open source and containers
papermill: A tool for parameterizing and executing Jupyter Notebooks
Natural Language Processing: Wikipedia article
Natural Language Processing With spaCy in Python: Real Python article
Kaggle: Help us better understand COVID-19
datree.io: Scale Engineering organization
repo2docker: Build, Run, and Push Docker Images from Source Code Repositories
Jupyter Docker Stacks: A set of ready-to-run Docker images
binder: Turn a Git Repo into a Collection of Interactive Notebooks
Hands-On Machine Learning with Scikit-Learn and TensorFlow: O’Reilly
Data Science from Scratch: O’Reilly
Python for Data Analysis: Wes McKinney - Creator of Pandas
Mentored Sprints for Diverse Beginners
The Alan Turing Institute
Easy Data Processing With Azure Fun - Tania Allard - PyCon 2020
PEP 581 – Using GitHub Issues for CPython
Python’s migration to GitHub - Request for Project Manager Resumes
Level up your Python skills with our expert-led courses:
Using Jupyter Notebooks
Histogram Plotting in Python: NumPy, Matplotlib, Pandas & Seaborn
Idiomatic pandas: Tricks & Features You May Not Know
Support the podcast & join our community of Pythonistas


