Edit Jan 2014: Some time ago this operator changed its name to Generate Data from ExampleSet Generator.

This operator has a number of parameters and the documentation doesn't quite give enough information (I'll update these when time and motivation permits).

Here is a list (taken from the source code) of how the label is calculated from the attributes for some of the values of the target_function parameter.

random: label = random

sum: label = att1 + att2 + att3 + ... + attn

polynomial: label = att1*att1*att1 + att2*att2 + att3

non linear: label = att1*att2*att3 + att1*att2 + att2*att2

one variable non linear: label = 3*att1*att1*att1 - att1*att1 + 1000 / abs(att1) + 2000*abs(att1)

complicated function: label = att1*att1*att2 - att1*att2 + max(att1,att2) - exp(att3)

complicated function2: label = att1*att1*att1 + att2*att2 + att1*att2 + att1/att2 - 1/(att3*att3)

simple sinus: label = sin(att1)

sinus: label = sin(att1*att2) + sin(att1+att2)

simple superposition: label = 5*sin(att1) + sin(30*att1)

sinus frequency: label = 10*sin(3*att1) + 12*sin(7*att1) + 11*sin(5*att2) + 9*sin(10*att2) + 10*sin(8*(att1 + att2))

sinus with trend

sinc

triangular function

square pulse function

random classification: label = "positive" or "negative" randomly chosen

one third classification: label = "positive" if att1 is greater than 0.333333333333 otherwise "negative"

sum classification: label = "positive" if the sum of all the attributes is greater than 0 otherwise "negative"

quadratic classification: label = "positive" if attribute2 > attribute1^2 otherwise "negative"

simple non linear classification: label = "positive" if attribute1*attribute2 is between 50 and 80 otherwise "negative"

interaction classification: label = "positive" if att1 lt 0 or (att2 gt 0 and att3 lt 0) otherwise "negative"

simple polynomial classification: label = "positive" if att1^4 > 100 otherwise "negative"

polynomial classification: label = "positive" if att0^3 + att1^2 - att2^2 + att3 > 0 otherwise "negative"

checkerboard classification

random dots classification

global and local models classification

sinus classification

multi classification: round the sum of the attributes and take an absolute integer value, if the result is divisible by 2 the label becomes "one", if divisible by 3 (but not 2) the label becomes "two", if divisible by 5 (but not 2 or 3) the label becomes "three", otherwise "four"

two gaussians classification

transactions dataset

grid function

three ring clusters

spiral cluster

single gaussian cluster

gaussian mixture clusters: generates clusters in an N dimensional space where N is the number of attributes. The number of clusters is 2^N (so take care, 20 attributes leads to more than a million clusters which slows RapidMiner down somewhat)

driller oscillation timeseries

## Sunday, 8 November 2009

Subscribe to:
Post Comments (Atom)

## No comments:

## Post a Comment