Search this blog

Sunday 16 January 2011

Be careful using example sets inside loops

If you have an example set that is used inside a loop it's important to remember that any changes made to attributes will be retained between iterations. To get round this, you must use the materialize data operator. This creates a new copy of the data at the expense of increased memory usage.

Here's an example that shows more and more noise being added to an example set before doing a linear regression.

If you plot noise against correlation and set the colour of the data points to be the attribute "before" you will see that when before = 2 (corresponding to no materialize operation), correlation drops more quickly than it should as noise increases. The noise-correlation curve is more reasonable when before = 1 (corresponding to enabling the materialize operation). The inference is that noise is being added on top of noise from a previous loop to make correlation poorer than expected.


In the example, the order in which the parameters is set is also important. If you get this wrong, the parameters will be reset at the wrong time relative to the materialize operator which causes incorrect results.

No comments:

Post a Comment