
k-means clustering
Data Skeptic
00:00
How Many Clusters Should I Have?
K means clustering. If you select a large k and therefore use more clusters, those clusters obviously fit the data better. But at some nt they undoubtedly begin to overfit the data. To make a decision like, how many clusters should i have, most people will recommend the elbow method. It's based on two variables: average distance of all points in a cluster from their associated centroid. We assume that mathematical value will be very large. And as a result, we have a nice arithmatic way that we can score ourselves between minus one and one.
Transcript
Play full episode