
Apache Spark (Pt. 2): MLlib - ML 074
Adventures in Machine Learning
00:00
Spark MLib vs SKLAR MLib
With distributed hyper ops, you can do 20 of these in parallel and I'm going to iterate over that 50 times. So you can test way more stuff that because you're just doing it parallel on different machines. We have pandas, UDFs where we can take an incredibly large data frame,. a dataset that's loaded in the Spark and we can chunk that up. This makes it simpler to use Spark ML web - which there's plenty of use cases in industry like that. You can be like fraud detection algorithms at massive scale.
Transcript
Play full episode