Data Science With RapidMiner: Rename by generic names and creating a model: which attributes are used?

Sunday, 21 April 2013

Rename by generic names and creating a model: which attributes are used?

The Rename by Generic Names operator can be used to rename attributes so they follow a simple naming convention with a generic stem and an incrementing counter. You would use this for example to get rid of punctuation or mathematical symbols that would prevent the Generate Attributes operator from working.

As an example, if you had regular attributes like

Average(height)
Minimum(height)
Variance(height)

The rename would yield something like

att1
att2
att3

This is more usable but is less understandable.

The other day, I stumbled on an odd side effect of this when building linear regression models on renamed attributes. Fortunately, I don't think it's a problem but there's a workaround anyway.

Firstly then, here is an ultra simple process that builds a linear regression model on some fake data which has had its attributes renamed generically. The example set it produces looks like this.

The regression model looks like this.

How odd; the names of the attributes before the rename have been used to describe the model. This causes confusion but as far as I can tell the models and weights seem to be fine when they are used in a process. The names are in fact still in the example set and can be seen from the Meta Data View by showing the Constructions column. This points to using the Materialize Data operator as a workaround. By adding this operator just after the rename, the model comes out as follows.

Less confusing for a human.

Data Science With RapidMiner

Search this blog

Sunday, 21 April 2013

Rename by generic names and creating a model: which attributes are used?

No comments:

Post a Comment

About Me

Labels

Blog Archive