Search this blog

Thursday, 30 December 2010

Counting clusters

The cluster count operator had a bug that the RapidMiner developers fixed.

Here's an example process that shows it working (warning: it takes 10 minutes)

Plot a histogram of the clusterNumber to get an idea about where a good clustering might be (hint: 8 is the "correct" answer)

Edit: This process uses the DBScan operator
Edit2: I changed the process to create better sample data so the expected number of clusters is more obvious.

Tuesday, 30 November 2010

Differentiate value series

Copy of the process from my posting from the Rapid-I forum

Link to file

Still works in version 5.1

Monday, 29 November 2010

MultiLabelClassification

Copied from my process on the Rapid-I forum.

File

Sunday, 29 August 2010

Various links

I'm studying for an MSC in data mining and business intelligence at the Institute of Technology Blanchardstown.

Rapidminer resources site video tutorials

Neuralmarkettrends video tutorials

Vancouverdata video tutorials

Thursday, 29 July 2010

Value series example

Copied from my posting on the Rapid-I site

Link to file

(Still works with 5.1)

(Still works with 5.2)

Friday, 30 April 2010

Fast Fourier Transform Example

Originally posted by me (awc) on the neuralmarkettrends forum

See file here

Note: later update to fix a bug in 5.1 where attribute names cannot be reserved words

Note2: the data is available on the neuralmarkettrends forum posting and is also here

Sunday, 3 January 2010

Using NullGenerator in value series preprocessing

Here's an example that uses value series preprocessing (This no longer works in 5.1 - the series operators have not worked well in version 5 as a whole)

<operator name="Root" class="Process" expanded="yes">
<parameter key="random_seed" value="-1"/>
<operator name="GenerateSeriesIOObjects" class="OperatorChain" expanded="no">
<operator name="Generate a sine wave SeriesIOObject" class="SinusGenerator">
<parameter key="number_of_values" value="2000"/>
<list key="frequency">
<parameter key="101" value="1.0"/>
</list>
</operator>
<operator name="Visualizer (2)" class="Visualizer" activated="no">
</operator>
<operator name="Convert into an ExampleSet" class="SeriesObject2ExampleSet">
</operator>
<operator name="Window into examples" class="MultivariateSeries2WindowExamples">
<parameter key="window_size" value="1002"/>
</operator>
<operator name="Add a label" class="WindowExamples2ModelingData">
<parameter key="label_name_stem" value="sinus_dim_1"/>
<parameter key="relative_transformation" value="false"/>
</operator>
<operator name="Delete the Id" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="sinus_index"/>
<parameter key="invert_filter" value="true"/>
<parameter key="apply_on_special" value="true"/>
</operator>
<operator name="Add an easier to read Id" class="IdTagging">
</operator>
<operator name="Change examples into seriesIO objects" class="Single2Series">
</operator>
</operator>
<operator name="ValueSeriesPreprocessing" class="ValueSeriesPreprocessing" expanded="yes">
<operator name="Branch" class="Branching" expanded="yes">
<parameter key="keep_only_last" value="false"/>
<operator name="Find the maximum frequency within each window" class="OperatorChain" expanded="yes">
<operator name="Split each example into 5 windows" class="Windowing" expanded="yes">
<parameter key="step_size" value="200"/>
<parameter key="window_size" value="200"/>
<operator name="OperatorChain (4)" class="OperatorChain" expanded="yes">
<operator name="DiscreteFourierTransform (2)" class="DiscreteFourierTransform">
</operator>
<operator name="MaxIndex" class="MaxIndex">
</operator>
</operator>
</operator>
<operator name="NullGenerator (4)" class="NullGenerator">
</operator>
</operator>
</operator>
</operator>
</operator>

This process does the following things
  1. Generates a set of examples each containing 1001 attributes
  2. Each example is converted into a SeriesIO object
  3. Each SeriesIO object is split into 5 windows each containing 200 values
  4. A Fourier transform is performed on each window and the index where the maximum is located is found
  5. The NullGenerator operator causes the maximum index for each window to be returned. This has the effect of returning the maxima for all the windows.