Adventures in Machine Learning cover image

Exploratory Data Analysis (EDA) in Machine Learning - ML 075

Adventures in Machine Learning

00:00

Getting a Good Look at the Distribution of a Y Variable

A common mistake that people make is they assume that the data are clean. If you see there's some logarithmic relationship of each feature to the target or something that just really stands out at saying, hey, this isn't something that we can do like a linear comparison to then transform it. That definitely would have broken your model. And I actually do that stage prior to doing stuff like NA filtering and looking at the data. It's not me going into the data warehouse or on the Delta Lake or the data lake and just pulling data at random and feeling like this seems like it's useful. But for me to even determine what data attempt to build my hypothesis, I'll go

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app