N starts at 1 and ends at whatever the largest bin number is. The N is not preceded by any leading zeros so this means that when sorted, range10 comes before range2. When using the histogram plotter this is OK because the nominal values have an implicit order that gets used. When using the advanced plotter however, a histogram comes out wrong.

Here's a histogram, produced using the advanced plotter, showing the original ordering. The data is 10,000 examples generated by multiplying 5 random numbers together and normalizing to the range 0 to 1.

As can be seen, the ordering of the bins is not in the same numerical order of the underlying numerical values.

This can be fixed by using regular expressions and the "Replace" operator.

I'm not enough of a regular expression ninja to do this in one operator so I had to use two.

So, in the first "Replace" operator, set the "replace what" field to

range(\d+.*)and set the replace by field to

range0000$1This will change all the values to have leading zeros inserted before the number within the value.

In the second "Replace" operator, set the replace field to

range0+(\d{4})(.*)and set the replace by field to

range$1$2This ensures that all the numeric parts of the range name are of the same length and are preceded by at least one leading 0.

Be aware that you might have to tweak these numbers if the number of names is different in your case.

The end result is then a histogram like this

Now the ordering is the same as the implied numerical ordering.

## No comments:

## Post a Comment