MLLib - A Great Way to Learn MLlib

I recommend any ML practitioner download all the data sets. Some of them were broken, fair warning, or require extensive cleanup. It's like a gold mine of ML data sets that are there for lots and lots of different use cases. If you have columnar data, you've already done your feature engineering work,. You can implement a pipeline in less than 60 lines of code in PySpark. And it's just been improved on over the years to the point where you can build a full end to end Spark ML implementation.

Play episode from 43:43

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app