Search this blog

Tuesday, 22 March 2011

Counting clusters: part II

Here's an example that uses the k-means clustering algorithm to partition example sets into clusters. It uses the same generated data as here but this time it uses different cluster performance operators to determine how well the clustering works.

Specifically there are examples for
  • Davies-Bouldin
  • Average within centroid distance
  • Cluster density
  • Sum of squares item distribution
  • Gini item distribution
Plotting these measures against different values of k shows something like this.

Interpreting the shape of these graphs is complex and the subject for another day. In this case, the "right" answer is 8 and the measures don't contradict this.

As usual, the answer does not appear by magic and clustering requires a human to look at the results but the performance measures give a helping hand to focus attention to important areas.

No comments:

Post a Comment