Adventures in Machine Learning cover image

Apache Spark (Pt. 2): MLlib - ML 074

Adventures in Machine Learning

00:00

Spark MLib vs SKLAR MLib

With distributed hyper ops, you can do 20 of these in parallel and I'm going to iterate over that 50 times. So you can test way more stuff that because you're just doing it parallel on different machines. We have pandas, UDFs where we can take an incredibly large data frame,. a dataset that's loaded in the Spark and we can chunk that up. This makes it simpler to use Spark ML web - which there's plenty of use cases in industry like that. You can be like fraud detection algorithms at massive scale.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app