Search this blog

Showing posts with label Documentation. Show all posts
Showing posts with label Documentation. Show all posts

Thursday, 22 March 2012

Operators that deserve to be better known: part III

Normalize

Of course this is already a well known operator but it has a useful option I discovered the other day. If you set the method to "proportion transformation", the normalization divides each numerical attribute with the sum of all the values for that attribute. This has the effect that the sum of each normalized attribute becomes 1.

This is much easier than using loop operators which would involve having a "loop examples" operator inside "loop attributes" with filtering, calculating sums, generating attributes and perhaps some attribute selection.

Wednesday, 12 October 2011

Generate Macro bonus features

I was pleased to discover that some of the functions available in the operator "Generate Attributes" are also available in the "Generate Macro" operator.

For example the following functions work.

concat
contains
matches
index
str
upper
lower
escape_html
replace

and I imagine that similar text processing functions will also work.

I tried date_now() and some other date functions and I got a result that looked like an error (actually quite an interesting error). So I assume date functions cannot be used.


Sunday, 8 November 2009

Generate Data - some detail

Edit Jan 2014: Some time ago this operator changed its name to Generate Data from ExampleSet Generator.

This operator has a number of parameters and the documentation doesn't quite give enough information (I'll update these when time and motivation permits).

Here is a list (taken from the source code) of how the label is calculated from the attributes for some of the values of the target_function parameter.

random: label = random
sum: label = att1 + att2 + att3 + ... + attn
polynomial: label = att1*att1*att1 + att2*att2 + att3
non linear: label = att1*att2*att3 + att1*att2 + att2*att2
one variable non linear: label = 3*att1*att1*att1 - att1*att1 + 1000 / abs(att1) + 2000*abs(att1)
complicated function: label = att1*att1*att2 - att1*att2 + max(att1,att2) - exp(att3)
complicated function2: label = att1*att1*att1 + att2*att2 + att1*att2 + att1/att2 - 1/(att3*att3)
simple sinus: label = sin(att1)
sinus: label = sin(att1*att2) + sin(att1+att2)
simple superposition: label = 5*sin(att1) + sin(30*att1)
sinus frequency: label = 10*sin(3*att1) + 12*sin(7*att1) + 11*sin(5*att2) + 9*sin(10*att2) + 10*sin(8*(att1 + att2))
sinus with trend
sinc
triangular function
square pulse function
random classification: label = "positive" or "negative" randomly chosen
one third classification: label = "positive" if att1 is greater than 0.333333333333 otherwise "negative"
sum classification: label = "positive" if the sum of all the attributes is greater than 0 otherwise "negative"
quadratic classification: label = "positive" if attribute2 > attribute1^2 otherwise "negative"
simple non linear classification: label = "positive" if attribute1*attribute2 is between 50 and 80 otherwise "negative"
interaction classification: label = "positive" if att1 lt 0 or (att2 gt 0 and att3 lt 0) otherwise "negative"
simple polynomial classification: label = "positive" if att1^4 > 100 otherwise "negative"
polynomial classification: label = "positive" if att0^3 + att1^2 - att2^2 + att3 > 0 otherwise "negative"
checkerboard classification
random dots classification
global and local models classification
sinus classification
multi classification: round the sum of the attributes and take an absolute integer value, if the result is divisible by 2 the label becomes "one", if divisible by 3 (but not 2) the label becomes "two", if divisible by 5 (but not 2 or 3) the label becomes "three", otherwise "four"
two gaussians classification
transactions dataset
grid function
three ring clusters
spiral cluster
single gaussian cluster
gaussian mixture clusters: generates clusters in an N dimensional space where N is the number of attributes. The number of clusters is 2^N (so take care, 20 attributes leads to more than a million clusters which slows RapidMiner down somewhat)
driller oscillation timeseries