The cluster count operator had a bug that the RapidMiner developers fixed.
Here's an example process that shows it working (warning: it takes 10 minutes)
Plot a histogram of the clusterNumber to get an idea about where a good clustering might be (hint: 8 is the "correct" answer)
Edit: This process uses the DBScan operator
Edit2: I changed the process to create better sample data so the expected number of clusters is more obvious.
Search this blog
Thursday, 30 December 2010
Tuesday, 30 November 2010
Monday, 29 November 2010
Sunday, 29 August 2010
Various links
I'm studying for an MSC in data mining and business intelligence at the Institute of Technology Blanchardstown.
Rapidminer resources site video tutorials
Neuralmarkettrends video tutorials
Vancouverdata video tutorials
Rapidminer resources site video tutorials
Neuralmarkettrends video tutorials
Vancouverdata video tutorials
Thursday, 29 July 2010
Value series example
Copied from my posting on the Rapid-I site
Link to file
(Still works with 5.1)
(Still works with 5.2)
Link to file
(Still works with 5.1)
(Still works with 5.2)
Friday, 30 April 2010
Fast Fourier Transform Example
Sunday, 3 January 2010
Using NullGenerator in value series preprocessing
Here's an example that uses value series preprocessing (This no longer works in 5.1 - the series operators have not worked well in version 5 as a whole)
<operator name="Root" class="Process" expanded="yes">
<parameter key="random_seed" value="-1"/>
<operator name="GenerateSeriesIOObjects" class="OperatorChain" expanded="no">
<operator name="Generate a sine wave SeriesIOObject" class="SinusGenerator">
<parameter key="number_of_values" value="2000"/>
<list key="frequency">
<parameter key="101" value="1.0"/>
</list>
</operator>
<operator name="Visualizer (2)" class="Visualizer" activated="no">
</operator>
<operator name="Convert into an ExampleSet" class="SeriesObject2ExampleSet">
</operator>
<operator name="Window into examples" class="MultivariateSeries2WindowExamples">
<parameter key="window_size" value="1002"/>
</operator>
<operator name="Add a label" class="WindowExamples2ModelingData">
<parameter key="label_name_stem" value="sinus_dim_1"/>
<parameter key="relative_transformation" value="false"/>
</operator>
<operator name="Delete the Id" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="sinus_index"/>
<parameter key="invert_filter" value="true"/>
<parameter key="apply_on_special" value="true"/>
</operator>
<operator name="Add an easier to read Id" class="IdTagging">
</operator>
<operator name="Change examples into seriesIO objects" class="Single2Series">
</operator>
</operator>
<operator name="ValueSeriesPreprocessing" class="ValueSeriesPreprocessing" expanded="yes">
<operator name="Branch" class="Branching" expanded="yes">
<parameter key="keep_only_last" value="false"/>
<operator name="Find the maximum frequency within each window" class="OperatorChain" expanded="yes">
<operator name="Split each example into 5 windows" class="Windowing" expanded="yes">
<parameter key="step_size" value="200"/>
<parameter key="window_size" value="200"/>
<operator name="OperatorChain (4)" class="OperatorChain" expanded="yes">
<operator name="DiscreteFourierTransform (2)" class="DiscreteFourierTransform">
</operator>
<operator name="MaxIndex" class="MaxIndex">
</operator>
</operator>
</operator>
<operator name="NullGenerator (4)" class="NullGenerator">
</operator>
</operator>
</operator>
</operator>
</operator>
This process does the following things
<operator name="Root" class="Process" expanded="yes">
<parameter key="random_seed" value="-1"/>
<operator name="GenerateSeriesIOObjects" class="OperatorChain" expanded="no">
<operator name="Generate a sine wave SeriesIOObject" class="SinusGenerator">
<parameter key="number_of_values" value="2000"/>
<list key="frequency">
<parameter key="101" value="1.0"/>
</list>
</operator>
<operator name="Visualizer (2)" class="Visualizer" activated="no">
</operator>
<operator name="Convert into an ExampleSet" class="SeriesObject2ExampleSet">
</operator>
<operator name="Window into examples" class="MultivariateSeries2WindowExamples">
<parameter key="window_size" value="1002"/>
</operator>
<operator name="Add a label" class="WindowExamples2ModelingData">
<parameter key="label_name_stem" value="sinus_dim_1"/>
<parameter key="relative_transformation" value="false"/>
</operator>
<operator name="Delete the Id" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="sinus_index"/>
<parameter key="invert_filter" value="true"/>
<parameter key="apply_on_special" value="true"/>
</operator>
<operator name="Add an easier to read Id" class="IdTagging">
</operator>
<operator name="Change examples into seriesIO objects" class="Single2Series">
</operator>
</operator>
<operator name="ValueSeriesPreprocessing" class="ValueSeriesPreprocessing" expanded="yes">
<operator name="Branch" class="Branching" expanded="yes">
<parameter key="keep_only_last" value="false"/>
<operator name="Find the maximum frequency within each window" class="OperatorChain" expanded="yes">
<operator name="Split each example into 5 windows" class="Windowing" expanded="yes">
<parameter key="step_size" value="200"/>
<parameter key="window_size" value="200"/>
<operator name="OperatorChain (4)" class="OperatorChain" expanded="yes">
<operator name="DiscreteFourierTransform (2)" class="DiscreteFourierTransform">
</operator>
<operator name="MaxIndex" class="MaxIndex">
</operator>
</operator>
</operator>
<operator name="NullGenerator (4)" class="NullGenerator">
</operator>
</operator>
</operator>
</operator>
</operator>
This process does the following things
- Generates a set of examples each containing 1001 attributes
- Each example is converted into a SeriesIO object
- Each SeriesIO object is split into 5 windows each containing 200 values
- A Fourier transform is performed on each window and the index where the maximum is located is found
- The NullGenerator operator causes the maximum index for each window to be returned. This has the effect of returning the maxima for all the windows.
Subscribe to:
Posts (Atom)