Search this blog

Tuesday 31 May 2011

Care needed using the Cross Distances operator

The "Cross Distances" operator is used to determine which examples in one set are closest to those in another.

Here's an example correctly showing the first example from the first set at zero distance from the first example in the second set. In other words, these two examples match.

Here's another example that behaves unexpectedly. In this case, the first example in the first set is shown at zero distance from the first example of the second set even though the attribute names and values do not match.

The reason for this is the order of the attributes in the example sets. The attributes are created in different orders and it is this that determines which attribute pairs are compared when calculating distances. You can see this ordering if you go to the meta data view and show the 'Table Index' column. The special attributes are also important.

Normally, most will find that this is not a problem. However, if you import data from two different sources that are supposed to represent the same data but which have columns in different orders, you will find that the "Cross Distances" operator will not behave as expected.

It is possible to work round this by using the "Generate Attributes" operator to recreate attributes in both example sets in the same order.

No comments:

Post a Comment