Talk Python To Me

#519: Data Science Cloud Lessons at Scale

32 snips
Sep 18, 2025
In this engaging discussion, Matthew Rocklin, the creator of Dask and co-founder of Coiled, and Nat Tabris, a staff software engineer at Coiled, delve into the complexities of running Python workloads at cloud scale. They unveil the reality of outgrowing local data processing, showcasing a live demo of spinning up a 1,000-core cluster. The pair shares savvy insights on choosing between pandas and Polars, optimizing costs, and the benefits of ephemeral clusters. Tune in for real-world lessons on navigating the cloud!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

From Dask To Cloud Platform

  • Matthew Rocklin described founding Dask and then Coiled to make Python run at cloud scale for data workloads.
  • He explained Coiled grew from Dask deployment pain into a general platform for running Python on many machines.
INSIGHT

Tooling Should Fit Data Workflows

  • Matthew argued tooling should hide cloud complexity rather than force data scientists to become infra experts.
  • He contrasted Docker/Kubernetes choices with the fast-changing needs of data exploration and experimentation.
ANECDOTE

Unexpected Bill From Leftover Resources

  • Matthew recounted receiving a surprise $400 AWS bill from leftover attached storage after accidentally leaving resources behind.
  • He used the story to illustrate how cloud abstractions can hide billing traps.
Get the Snipd Podcast app to discover more snips from this episode
Get the app