Adventures in Machine Learning cover image

Apache Spark Integration and Platform Execution for ML - ML 073

Adventures in Machine Learning

00:00

Why Are Rows Split Up Instead of Columns?

Charles Maxwell: Why do you partition by row and not by column? So why are rows split up instead of columns? Well, that's a good question. That's a, that's an architecture question for why Spark was designed in the way it was. In order to get the maximum amount of parallel computation while maintaining ability to reference other elements of a contained data vector,. You want to make sure all of the column data for a row is referenceable by that column. Most of the conditional logic you're going to have is going to be referenced by that particular row of data. The only way that to really separate that concept of column wise operations is to transpose a table. And there

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app