Search this blog

Sunday 8 November 2009

Generate Data - some detail

Edit Jan 2014: Some time ago this operator changed its name to Generate Data from ExampleSet Generator.

This operator has a number of parameters and the documentation doesn't quite give enough information (I'll update these when time and motivation permits).

Here is a list (taken from the source code) of how the label is calculated from the attributes for some of the values of the target_function parameter.

random: label = random
sum: label = att1 + att2 + att3 + ... + attn
polynomial: label = att1*att1*att1 + att2*att2 + att3
non linear: label = att1*att2*att3 + att1*att2 + att2*att2
one variable non linear: label = 3*att1*att1*att1 - att1*att1 + 1000 / abs(att1) + 2000*abs(att1)
complicated function: label = att1*att1*att2 - att1*att2 + max(att1,att2) - exp(att3)
complicated function2: label = att1*att1*att1 + att2*att2 + att1*att2 + att1/att2 - 1/(att3*att3)
simple sinus: label = sin(att1)
sinus: label = sin(att1*att2) + sin(att1+att2)
simple superposition: label = 5*sin(att1) + sin(30*att1)
sinus frequency: label = 10*sin(3*att1) + 12*sin(7*att1) + 11*sin(5*att2) + 9*sin(10*att2) + 10*sin(8*(att1 + att2))
sinus with trend
sinc
triangular function
square pulse function
random classification: label = "positive" or "negative" randomly chosen
one third classification: label = "positive" if att1 is greater than 0.333333333333 otherwise "negative"
sum classification: label = "positive" if the sum of all the attributes is greater than 0 otherwise "negative"
quadratic classification: label = "positive" if attribute2 > attribute1^2 otherwise "negative"
simple non linear classification: label = "positive" if attribute1*attribute2 is between 50 and 80 otherwise "negative"
interaction classification: label = "positive" if att1 lt 0 or (att2 gt 0 and att3 lt 0) otherwise "negative"
simple polynomial classification: label = "positive" if att1^4 > 100 otherwise "negative"
polynomial classification: label = "positive" if att0^3 + att1^2 - att2^2 + att3 > 0 otherwise "negative"
checkerboard classification
random dots classification
global and local models classification
sinus classification
multi classification: round the sum of the attributes and take an absolute integer value, if the result is divisible by 2 the label becomes "one", if divisible by 3 (but not 2) the label becomes "two", if divisible by 5 (but not 2 or 3) the label becomes "three", otherwise "four"
two gaussians classification
transactions dataset
grid function
three ring clusters
spiral cluster
single gaussian cluster
gaussian mixture clusters: generates clusters in an N dimensional space where N is the number of attributes. The number of clusters is 2^N (so take care, 20 attributes leads to more than a million clusters which slows RapidMiner down somewhat)
driller oscillation timeseries

No comments:

Post a Comment