
Exploratory Data Analysis (EDA) in Machine Learning - ML 075
Adventures in Machine Learning
00:00
How to Do EDA on Large Data Sets?
If it takes an hour to run a histogram, well, you just can't iterate. So it's really important to down sample. Usually narrow down the feature set before you down sample and then look at graphs. If you have a 50 50 split on your data set, you can kind of get away with not worrying too much about any additional processing steps or fancy sampling that you need to do in order to create train tests and hold out. But if you have a one to one million ratio here, you're talking about broad detection,. churn detection on a business that has a really sticky baseline of customer support for your product. You're going to have a massive skew.
Transcript
Play full episode