Search this blog


Monday, 6 October 2014

RapidMiner Resources advanced videos

After a bit of work. I'm pleased to say I've completed the RapidMiner Resources advanced videos and they'll be available on the RapidMiner Resources site soon.

I maintain meta data about the videos and operators and for fun, I've made a process using this data and a new operator I've discovered called "Transition Graph". This is a candidate for "operators that deserve to be better known" because it allows pretty graphs to be drawn.

The meta data I keep records the main operators each video uses as well as the overall running time of the video and the course which it is classified as. Here's a process that takes this data and allows different graphs to be drawn to show which operators are used in which video as well as which video uses which operator.

A brief note on the names - I've prepended "o" for operator and "v" for video to make things clear.

Here is a graph showing that the "Generate Macro" operator is important in 5 videos

Here is another graph that shows the most important operators used by the video called "Macros".

Here's another that shows which operators are covered by each course and what overlap there is.

The process reads a CSV file (here) to generate these graphics. Of course, as time goes on, I will add new videos so the data in the process is a snapshot as at early October 2014. Nonetheless, please feel free to download the process and data and play around with the results to see the videos I have created and the operators that are covered.

The next videos to do are about text mining...

Monday, 25 August 2014


Here is a process to plot the Mandelbrot set. It's based on the one that was successful at the recent RapidMiner World conference.

It makes pretty pictures like this.

Various macros control the execution of the process. With the following settings,

yPoints: 80
xPoints: 120
iterations: 200
xmin: -0.95
xmax: -0.855
ymin: 0.2375
ymax: 0.3275

a zoomed in view like this is produced - how cool.

I noticed a feature of the advanced plotter that limits the number of points that get plotted. This is a configuration setting found at Tools->Preferences->Gui->rapidminer.gui.plotter.rows.maximum. This is 5,000 by default. If you want to see all the points for the settings above then set this to 9,600.

The process itself is in 2 main parts.

Firstly, the sub-process creates the x and y axes which I called x0 and y0. This is done using the operators "Generate Data", "Generate ID", and "Normalize" for the x and y axes. These are then joined using the "Cartesian Product" operator to produce all possible combinations of the x and y axes. The resulting example set is stored in the process context using the "Remember" operator.

Secondly, the "Loop" operator uses the "Recall" operator to get the latest example set to work on and performs the necessary calculations to generate the Mandelbrot set. The result of each iteration is remembered in the process context so the next loop iteration can carry on. There is some cunning filtering to reduce the amount of effort in each loop. Note the "Materialize Data" operator. This is often needed and does no harm if it is included.

At the end of the loop operation, nothing is output from the "Loop" operator itself. The output from the main process is simply a "Recall" operator which uses the last example set that was worked on inside the loop operation.

By having nothing output from the loop operation, the memory impact of this process is reduced.

Sunday, 3 August 2014

New videos coming soon

I've created another set of videos. These are slightly more advanced and tend to combine more operators together to tell a story.

Here's a graphic using RapidMiner's advanced plotting capabilities that shows the video names and the main operators explained during the video.

They'll be available on the RapidMinerResources site very soon.

I plan to do some new ones over the next few months and the question is what do I choose?

My current candidate list is.

  • Groovy Dark Arts
  • Text Processing 
  • Web Mining
  • Time Series in more detail
  • RapidMiner Server
Each would translate to between 10 and 20 videos. 

To help me decide which one I will do next, I'd be happy to get feedback. So please leave a comment and it will certainly help me.

Edit: I took the liberty of doing a mini survey at the RapidMiner World conference. The results are shown here

 I'll take notice of this and give some focus to Text Mining and RapidMiner Server.