Data Skeptic cover image

k-means clustering

Data Skeptic

CHAPTER

How Many Clusters Should I Have?

K means clustering. If you select a large k and therefore use more clusters, those clusters obviously fit the data better. But at some nt they undoubtedly begin to overfit the data. To make a decision like, how many clusters should i have, most people will recommend the elbow method. It's based on two variables: average distance of all points in a cluster from their associated centroid. We assume that mathematical value will be very large. And as a result, we have a nice arithmatic way that we can score ourselves between minus one and one.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner