Search this blog

Tuesday 22 November 2011

Normalizing Rows

The "Normalize" operator normalizes each numerical column (called an attribute in RapidMiner's terminology) within an example set to the desired range. Sometimes, you might want to normalize all the numerical values within a row (called an example within an example set in RapidMiner's terminology). I had to do this while understanding how term frequencies are calculated during document vector creation.

A nifty trick is to transpose the example set, normalize and then transpose it back again and the process here shows a very simple example.

This also shows the result of a document processing step which results in term frequencies for comparison. I recently found that RapidMiner performs a cosine normalization when producing term frequencies (it divides by the square root of the sum of the squares of the frequencies within a row which is equivalent to a document) and I wanted to see what differences show up when the sum of the frequencies is used instead (answer: not much with the data I was playing with).

Edit: added RapidMiner equivalent terminology for rows and columns

No comments:

Post a Comment