Scaling Out to Multiple Machines of Pandas Operations

spark natively speaks pandas api. So that's sort of a key thing to take away from, because as i go through these other things, you'll see that recurring. Then we have dask is a schedule, and you can think of it as sort of an alternate implementation of spark. But it has some things that basically allow you to do scale out to multiple machines of pandas operations. There's another one i just heard about recently, turality. I'll call this serviles auto scalable pandas service. AndBasically you throw your pandas coat on that, and it scales it out and does its magic, but it's speaking pandas. That was a day to

Play episode from 42:44

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Effective Pandas Patterns For Data Engineering

Data Engineering Podcast

Scaling Out to Multiple Machines of Pandas Operations

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links

The AI-powered Podcast Player