
Exploratory Data Analysis (EDA) in Machine Learning - ML 075
Adventures in Machine Learning
00:00
Getting a Good Look at the Distribution of a Y Variable
A common mistake that people make is they assume that the data are clean. If you see there's some logarithmic relationship of each feature to the target or something that just really stands out at saying, hey, this isn't something that we can do like a linear comparison to then transform it. That definitely would have broken your model. And I actually do that stage prior to doing stuff like NA filtering and looking at the data. It's not me going into the data warehouse or on the Delta Lake or the data lake and just pulling data at random and feeling like this seems like it's useful. But for me to even determine what data attempt to build my hypothesis, I'll go
Transcript
Play full episode