#519: Data Science Cloud Lessons at Scale

32 snips

Sep 18, 2025

In this engaging discussion, Matthew Rocklin, the creator of Dask and co-founder of Coiled, and Nat Tabris, a staff software engineer at Coiled, delve into the complexities of running Python workloads at cloud scale. They unveil the reality of outgrowing local data processing, showcasing a live demo of spinning up a 1,000-core cluster. The pair shares savvy insights on choosing between pandas and Polars, optimizing costs, and the benefits of ephemeral clusters. Tune in for real-world lessons on navigating the cloud!

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

From Dask To Cloud Platform

Matthew Rocklin described founding Dask and then Coiled to make Python run at cloud scale for data workloads.
He explained Coiled grew from Dask deployment pain into a general platform for running Python on many machines.

INSIGHT

Tooling Should Fit Data Workflows

Matthew argued tooling should hide cloud complexity rather than force data scientists to become infra experts.
He contrasted Docker/Kubernetes choices with the fast-changing needs of data exploration and experimentation.

ANECDOTE

Unexpected Bill From Leftover Resources

Matthew recounted receiving a surprise $400 AWS bill from leftover attached storage after accidentally leaving resources behind.
He used the story to illustrate how cloud abstractions can hide billing traps.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

#519: Data Science Cloud Lessons at Scale

From Dask To Cloud Platform

Tooling Should Fit Data Workflows

Unexpected Bill From Leftover Resources

Links from the show