Spark

Spark has a mode where if you don't have enough memory available to perform in our RDD operation, it's going to do interim spill. And that means that it'll start chunking up the processing step that it needs to do while using an accumulator locally. It's basically doing a full left or a full right on the operation. In order for it to process data that it can't fit into memory, it has to write it to local disk on that particular executor. So there's clever ways of doing that, like salting the key and doing that interim calculation.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app