

#519: Data Science Cloud Lessons at Scale
32 snips Sep 18, 2025
In this engaging discussion, Matthew Rocklin, the creator of Dask and co-founder of Coiled, and Nat Tabris, a staff software engineer at Coiled, delve into the complexities of running Python workloads at cloud scale. They unveil the reality of outgrowing local data processing, showcasing a live demo of spinning up a 1,000-core cluster. The pair shares savvy insights on choosing between pandas and Polars, optimizing costs, and the benefits of ephemeral clusters. Tune in for real-world lessons on navigating the cloud!
AI Snips
Chapters
Transcript
Episode notes
From Dask To Cloud Platform
- Matthew Rocklin described founding Dask and then Coiled to make Python run at cloud scale for data workloads.
- He explained Coiled grew from Dask deployment pain into a general platform for running Python on many machines.
Tooling Should Fit Data Workflows
- Matthew argued tooling should hide cloud complexity rather than force data scientists to become infra experts.
- He contrasted Docker/Kubernetes choices with the fast-changing needs of data exploration and experimentation.
Unexpected Bill From Leftover Resources
- Matthew recounted receiving a surprise $400 AWS bill from leftover attached storage after accidentally leaving resources behind.
- He used the story to illustrate how cloud abstractions can hide billing traps.