Spark MLib vs SKLAR MLib

With distributed hyper ops, you can do 20 of these in parallel and I'm going to iterate over that 50 times. So you can test way more stuff that because you're just doing it parallel on different machines. We have pandas, UDFs where we can take an incredibly large data frame,. a dataset that's loaded in the Spark and we can chunk that up. This makes it simpler to use Spark ML web - which there's plenty of use cases in industry like that. You can be like fraud detection algorithms at massive scale.

Play episode from 36:10

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app