Search this blog

Saturday 6 May 2023

Reading more examples than your licence allows

Recently, I found a way to read more examples than your license allows.

With the free version of RapidMiner Studio, example sets are limited to 10,000 rows. Using the Python or R scripting operators, it is of course possible to read more than this but as soon as the example sets are returned to RapidMiner, the license limit is imposed.

However, if the data is processed into 10,000 row batches, it is possible to place these batches into a collection. Common processing can be applied to each batch by using a loop collections operator. 

Of course, if you append the collection entries and the result is greater than your license limit, restrictions will happen. 

The Python code looks a bit like this.

df = pandas.read_csv('mybigdata.csv')
batch1 = df[0:10000]
batch2 = df[10000:20000]
return batch1, batch2

Make sure you connect two outputs from the Python operator to a Collect operator and you will have 20,000 rows in your collection consisting of 2 x 10,000 rows.

I could have written the whole thing in Python of course.

Needless to say, RapidMiner might get upset with such breaches of their licencing, so you should not use this unless you are willing to take any consequences.