
Apache Spark (Pt. 2): MLlib - ML 074
Adventures in Machine Learning
00:00
MLLib - A Great Way to Learn MLlib
I recommend any ML practitioner download all the data sets. Some of them were broken, fair warning, or require extensive cleanup. It's like a gold mine of ML data sets that are there for lots and lots of different use cases. If you have columnar data, you've already done your feature engineering work,. You can implement a pipeline in less than 60 lines of code in PySpark. And it's just been improved on over the years to the point where you can build a full end to end Spark ML implementation.
Transcript
Play full episode