Data Science With RapidMiner: Stacking

Saturday, 7 November 2009

Stacking

With version 4.6 of Rapidminer the tutorial entitled 19_Stacking demonstrates stacking. This is a way to use the results of one type of learning model to improve a later model.

In the example there are 4 learning models. In order these are naive bayes, decision tree, nearest neighbours and linear regression.

The stacking operator applies the decision tree operator first and adds a new attribute to the example set. This attribute corresponds to the prediction made by the decision tree learner. Next, it applies the nearest neighbours model and adds another new atribute. The linear regression model is applied next to add a third new prediction.

At this point, the example set contains the original label, the original attributes as well as three new attributes for the three predictions. The naive bayes model now makes its prediction. This prediction should be better because of the other models' predictions.

The example in the tutorial doesn't show this.

If you want to then do the following. Put an XValidation operator around the stacking operator, add an operator chain after this and then add a model applier operator and a performance operator. This is best explained with a picture.

This is a standard way to perform cross validation, I have a more detailed explanation here.

If you run this you will get what is known as a confusion matrix. This shows how well the model is at classifying based on a comparison against known classifications. Here's another picture to show this.

In this case, the number of incorrect classifications is very small.

Now we can try to change the original stacking operator to see what effect each of the learning models has on the results. For example, disable the decision trees operator and run the process again. This time the result is much poorer. Here's an example.

Now it's possible to try different operators to see what effect each has. There is an advanced feature of Rapidminer that allows a search to be done automatically but that's a subject for another day.

Data Science With RapidMiner

Search this blog

Saturday, 7 November 2009

Stacking

No comments:

Post a Comment

About Me

Labels

Blog Archive