Search this blog

Thursday, 23 May 2013

Finding the next Sunday

I was asked to create a process to find the next Sunday from a given date.

It's a bit clunky but here is a screenshot of the Generate Attributes operator with the various calculations. It shows how much flexibility there is hidden within this operator.

I'm not sure what would happen over a year boundary so it might be worth doing some more checks in this case.

I can think of a couple of other ways this could be done but I might ask for beer or money to work out the details.

Sunday, 12 May 2013

Saving an example set with the details of the process that created it

Often when there is a lot of data to process, it helps to store intermediate results in the repository.

This allows long multi-step processes to proceed through a series of checkpoints so that if an error occurs, you are not forced to go back to the beginning.

Of course, it does require a certain discipline to be clear what each example set is and where it came from. I often fall into the lazy trap of calling example sets "temp1", "temp2" and so on. This makes it difficult to know what you are dealing with.

To get round this, I created a Groovy script that outputs the entire process XML into a macro. I then use the macro as an annotation that I associate with the example set. I can then store the example set in the repository and if later I want to check how I generated the data, I can simply load it, extract the XML and use it as the basis for recreating the original process in order to help me understand where the data came from.

The Groovy script is only 3 lines long and is shown below.

import com.rapidminer.*;
operator.getProcess().getMacroHandler().addMacro("processXML", operator.getProcess().toString());
return input;

The macro that gets created in this case is called "processXML" and can be used in the normal way.

Tuesday, 7 May 2013

Built-in macros

There are a number of pre-defined macros that can be used within RapidMiner. I keep forgetting the details of these so I decided to write them down once and for all.

These do not show up in the macro view but it is possible to use them like other macros.

I copied the following text from the version 4.6 RapidMiner documentation...

%{a} is replaced by the number of times the operator was applied.
%{b} is replaced by the number of times the operator was applied plus one, i.e. %a + 1. This is a shortcut for %p[1].
%{p[number }] is replaced by the number of times the operator was applied plus the given number, i.e. %a + number. (note - this should be %{p[N]}
%{t} is replaced by the system time.
%{n} is replaced by the name of the operator.
%{c} is replaced by the class of the operator.
%{%} becomes %.
%{process_name} becomes the name of the process file (without path and extension).
%{process_file} becomes the name of the process file (with extension).
%{process_path} becomes the path of the process file.

I've tried these - I can't get %{p[n]} to work nor all the ones starting with "process_". No matter, the others work.

Here is a screenshot of a Generate Macro process that uses them.

Here is a screenshot of the results from the Macro view.