
Apache Spark Integration and Platform Execution for ML - ML 073
Adventures in Machine Learning
00:00
Spark
Spark has a mode where if you don't have enough memory available to perform in our RDD operation, it's going to do interim spill. And that means that it'll start chunking up the processing step that it needs to do while using an accumulator locally. It's basically doing a full left or a full right on the operation. In order for it to process data that it can't fit into memory, it has to write it to local disk on that particular executor. So there's clever ways of doing that, like salting the key and doing that interim calculation.
Transcript
Play full episode