Thursday, 30 December 2010

Counting clusters

The cluster count operator had a bug that the RapidMiner developers fixed.

Here's an example process that shows it working (warning: it takes 10 minutes)

Plot a histogram of the clusterNumber to get an idea about where a good clustering might be (hint: 8 is the "correct" answer)

Edit: This process uses the DBScan operator
Edit2: I changed the process to create better sample data so the expected number of clusters is more obvious.