Data Science With RapidMiner: Saving an example set with the details of the process that created it

Sunday, 12 May 2013

Saving an example set with the details of the process that created it

Often when there is a lot of data to process, it helps to store intermediate results in the repository.

This allows long multi-step processes to proceed through a series of checkpoints so that if an error occurs, you are not forced to go back to the beginning.

Of course, it does require a certain discipline to be clear what each example set is and where it came from. I often fall into the lazy trap of calling example sets "temp1", "temp2" and so on. This makes it difficult to know what you are dealing with.

To get round this, I created a Groovy script that outputs the entire process XML into a macro. I then use the macro as an annotation that I associate with the example set. I can then store the example set in the repository and if later I want to check how I generated the data, I can simply load it, extract the XML and use it as the basis for recreating the original process in order to help me understand where the data came from.

The Groovy script is only 3 lines long and is shown below.

import com.rapidminer.*;
operator.getProcess().getMacroHandler().addMacro("processXML", operator.getProcess().toString());
return input;

The macro that gets created in this case is called "processXML" and can be used in the normal way.

4 comments:

Anonymous23 May 2013 at 07:26
Hi Andrew,

Would you accept a rapidminer challenge?
How would you generate a weekending date from any given date?

Assume the week ends on a Sunday.

Cheers,
ReplyDelete
Replies
Andrew23 May 2013 at 21:36
You would use the week number represented by "w" in the various date parsing functions.

date_parse_custom(yearWeek,"yyyy w")

Although there are some gymnastics before then.

I'll make a separate post with an example.
ReplyDelete
Replies
Unknown23 October 2015 at 03:35
Hi how do I save the output or results from my process, so that I can access them again later. So for example if I want to see my decision tree again later without running the whole process again.
ReplyDelete
Replies

Add comment

Data Science With RapidMiner

Search this blog

Sunday, 12 May 2013

Saving an example set with the details of the process that created it

4 comments:

About Me

Labels

Blog Archive