Search this blog

Sunday 3 April 2011

How item distribution performance is calculated for numeric attributes

The item distribution performance operator calculates its performance in the sum of squares case very simply. The number of examples in each cluster is divided by the total number of examples in all clusters. This is squared and the values for each cluster are summed.

For example, three clusters containing 1, 2 and 3 items would have an item distribution score of 0.389. This is 1/36 + 4/36 + 9/36 = 14/36 = 0.389.

For a situation where one cluster dominates and the others clusters are very small in comparison, this value will tend to 1. For the opposite situation, where the clusters have equal numbers of examples, the value tends to 1/N where N is the number of clusters.

No comments:

Post a Comment