Search this blog

Monday, 25 August 2014


Here is a process to plot the Mandelbrot set. It's based on the one that was successful at the recent RapidMiner World conference.

It makes pretty pictures like this.

Various macros control the execution of the process. With the following settings,

yPoints: 80
xPoints: 120
iterations: 200
xmin: -0.95
xmax: -0.855
ymin: 0.2375
ymax: 0.3275

a zoomed in view like this is produced - how cool.

I noticed a feature of the advanced plotter that limits the number of points that get plotted. This is a configuration setting found at Tools->Preferences->Gui->rapidminer.gui.plotter.rows.maximum. This is 5,000 by default. If you want to see all the points for the settings above then set this to 9,600.

The process itself is in 2 main parts.

Firstly, the sub-process creates the x and y axes which I called x0 and y0. This is done using the operators "Generate Data", "Generate ID", and "Normalize" for the x and y axes. These are then joined using the "Cartesian Product" operator to produce all possible combinations of the x and y axes. The resulting example set is stored in the process context using the "Remember" operator.

Secondly, the "Loop" operator uses the "Recall" operator to get the latest example set to work on and performs the necessary calculations to generate the Mandelbrot set. The result of each iteration is remembered in the process context so the next loop iteration can carry on. There is some cunning filtering to reduce the amount of effort in each loop. Note the "Materialize Data" operator. This is often needed and does no harm if it is included.

At the end of the loop operation, nothing is output from the "Loop" operator itself. The output from the main process is simply a "Recall" operator which uses the last example set that was worked on inside the loop operation.

By having nothing output from the loop operation, the memory impact of this process is reduced.

Sunday, 3 August 2014

New videos coming soon

I've created another set of videos. These are slightly more advanced and tend to combine more operators together to tell a story.

Here's a graphic using RapidMiner's advanced plotting capabilities that shows the video names and the main operators explained during the video.

They'll be available on the RapidMinerResources site very soon.

I plan to do some new ones over the next few months and the question is what do I choose?

My current candidate list is.

  • Groovy Dark Arts
  • Text Processing 
  • Web Mining
  • Time Series in more detail
  • RapidMiner Server
Each would translate to between 10 and 20 videos. 

To help me decide which one I will do next, I'd be happy to get feedback. So please leave a comment and it will certainly help me.

Edit: I took the liberty of doing a mini survey at the RapidMiner World conference. The results are shown here

 I'll take notice of this and give some focus to Text Mining and RapidMiner Server.