Search this blog

Sunday, 29 December 2013

Principal Component Analysis operator notes

As stated in the help. the operator uses the covariance matrix of the input attributes. Normalizing the attributes using the Normalize operator and Z-transformation method before the PCA operation has the effect that the correlation matrix is used in the PCA operator (As Wikipedia states, the correlation matrix can be seen as the the covariance matrix of the standardized variables).

Some small differences emerge in the calculated standard deviations in comparison with R's princomp function. I believe this is because the R function uses N-1 as the divisor when calculating standard deviations whereas RapidMiner uses N.