Data Science With RapidMiner: Counting clusters: part II

Tuesday, 22 March 2011

Counting clusters: part II

Here's an example that uses the k-means clustering algorithm to partition example sets into clusters. It uses the same generated data as here but this time it uses different cluster performance operators to determine how well the clustering works.

Specifically there are examples for

Davies-Bouldin
Average within centroid distance
Cluster density
Sum of squares item distribution
Gini item distribution

Plotting these measures against different values of k shows something like this.

Interpreting the shape of these graphs is complex and the subject for another day. In this case, the "right" answer is 8 and the measures don't contradict this.

As usual, the answer does not appear by magic and clustering requires a human to look at the results but the performance measures give a helping hand to focus attention to important areas.

Data Science With RapidMiner

Search this blog

Tuesday, 22 March 2011

Counting clusters: part II

No comments:

Post a Comment

About Me

Labels

Blog Archive