
Apache Spark Integration and Platform Execution for ML - ML 073
Adventures in Machine Learning
00:00
Why Are Rows Split Up Instead of Columns?
Charles Maxwell: Why do you partition by row and not by column? So why are rows split up instead of columns? Well, that's a good question. That's a, that's an architecture question for why Spark was designed in the way it was. In order to get the maximum amount of parallel computation while maintaining ability to reference other elements of a contained data vector,. You want to make sure all of the column data for a row is referenceable by that column. Most of the conditional logic you're going to have is going to be referenced by that particular row of data. The only way that to really separate that concept of column wise operations is to transpose a table. And there
Transcript
Play full episode