Search this blog

Wednesday, 24 April 2013

Operators that deserve to be better known: part VI

Remove Unused Values

This operator removes the possible nominal values that an attribute can have but which are not used in the example set. This can happen if the example set has been sampled or filtered.

As an example, if an attribute is called Fruit and has possible values Apple, Banana, Orange or Pear and some filtering is done to remove all except Apple, the possible values for Fruit can still be the other values but there are no examples in the filtered example set that use these values. This can be seen in the meta data view for the example set.

This is not normally a problem but if you have a giant data set with lots of nominal values the resulting example set can be slow to process even after sampling or filtering. A particular case would be where each attribute value is a line of text. I had a situation like this where I sampled 2 million rows down to 100 in order to get my process working only to find that the seemingly small 100 row example set was taking a long time to load.

This is easily resolved by using the Remove Unused Values operator (and as an aside you could also simply convert the polynominal attribute to be of type text in the case of text processing).

No comments:

Post a Comment