Search this blog

Tuesday, 25 June 2013

Setting outliers to missing

I had some attributes whose values were very out of range and because I was in a hurry I couldn't go back to the source data and eliminate them there. What I wanted to do was to set the value to missing if it was greater than a certain value.

So I used the Generate Attributes operator like so.

In other words if (a1 > 5) then set a1 to missing (by dividing 0 by 0).

It's OK to modify the value of an existing attribute and not create a new attribute at all. I'm sure this used not to work so an enhancement has sneaked in - maybe my memory is playing tricks - I don't mind though - it works.

Saturday, 1 June 2013

Using Groovy to make an arbitrary example set

Here's a Groovy process to make an example set with the number of attributes you want as well as the number of examples.

Integer numberOfAttributes = operator.getProcess().macroHandler.getMacro("numberOfAttributes").toInteger();
Integer numberOfExamples = operator.getProcess().macroHandler.getMacro("numberOfExamples").toInteger();
Attribute[] attributes = new Attribute[numberOfAttributes];
for (i = 0; i < numberOfAttributes; i++) {
name = "att_" + i.toString();
attributes[i] = AttributeFactory.createAttribute(name, Ontology.STRING);
MemoryExampleTable table = new MemoryExampleTable(attributes);
DataRowFactory ROW_FACTORY = new DataRowFactory(0);
String[] values = new String[numberOfAttributes];
for (j = 0; j < numberOfExamples; j++){
for (i = 0; i < numberOfAttributes; i++) {
values[i] = 0;
DataRow row = ROW_FACTORY.create(values, attributes);
ExampleSet exampleSet = table.createExampleSet();
return exampleSet;

The process uses two macros to dictate the size of the example set as follows

  • numberOfAttributes
  • numberOfExamples
These are set in the process context but can easily be defined in other ways.

The attribute names are prefixed with "att_" and the default value is 0. A bit of coding can change this.

Of course, the operators that are already available can be used to recreate this but my personal best is 8 operators to recreate the Groovy script above. I figured a 7 click saving was worth investing a bit of time to get.

Edit: I improved my personal best to 4 operators.