Search this blog

Thursday, 25 January 2018

Keras + RapidMiner + digit recognition = 97% accuracy

I've successfully created a process using RapidMiner and Keras to recognise the MNIST handwritten digits with a headline accuracy of 97% on unseen data.

You can download the process here.

It requires R and Keras to be installed - an exercise for the reader.

The main features of the process are

  • R is used to get the MNIST data and create a training set and a test set. I use R to add the label to the data to make a single example set for the two cases rather than have separate structures for the data and the labels. This is a big strength of RapidMiner because all the data and labels are in one place. 
  • The data is restructured in R to change 3d tensors of shape (60000, 28, 28) to 2d tensors of shape (60000, 768). The 3d tensor represents the images each of size 28 by 28 pixels. RapidMiner example sets are 2d tensors but these are OK to feed into the Keras part of the process.
  • The Keras part of the model has the following characteristics
    • The input shape is (784,), this matches the number of columns in the 2d tensor.
    • The loss parameter is set to "categorical_crossentropy" and the optimizer is set to "RMSprop".
    • There are 2 layers in the Keras model. The first is "dense" with 512 units and activation set to "relu". The second is "dense" with 10 units and activation set to "softmax". The 10 in this case is the number of different possible values the label can be.
  • The "validation_split" parameter is set to 0.1 so that a loss is calculated on a small part of the training data. This leads to validation loss results in the output which is used to see when over-fitting is happening.
Here is a screenshot of the history from a large run (this is output from the Keras model as an example set).

The training loss (in blue) decreases systematically as the model learns the training data more and more. The loss against the validation data (in red) shows worse performance as the number of epochs increases and the variation between epochs is evidence that perhaps I should use a larger training fraction. Nonetheless, only a small number of epochs would be enough to get a model that would perform well on unseen data.

The Keras model does not use convolution layers (an exercise for a later post) but despite this, it performs very well. Here is the confusion matrix using 3 epochs.

This is a very good result and shows the power of deep learning. It's gratifying that RapidMiner supports it.

As time permits, a future post will look at using convolution layers to see what improvements could be achieved. I may also do some systematic experiments to check how validation loss measured during training maps to actual loss on unseen data.

Wednesday, 24 January 2018

Visualising the MNIST numbers data

Keras comes with some built in functions to obtain the MNIST dataset created by the National Institute of Standards and Technology. As far as I can tell, it's not possible to get access to these from within RapidMiner but never fear, here is a process that can do it.

It uses R and obviously requires Keras to be have been installed. I'll leave that to the reader to get right.

The process also chooses one of the digits and casts it into a form that allows it to be displayed. It does this using the "Windowing" operator followed by "De-Pivot" to transform the matrix like data into x,y,z tuples.

Here's the 6th digit displayed using a block chart. This looks like a 2.

I've already used R with Keras to create a classifier that can recognise these digits. This is my first step towards using Keras in RapidMiner to build a classifier to do the same job.

Visualising discrete wavelet transforms: updated for RapidMiner v8

I revisited a previous post about visualising discrete wavelet transforms because I wanted to remember how I did something. The process is quite old and did not work first time with version 8 of RapidMiner Studio. There have been some subtle changes with respect to the requirements for the type of attributes for the "Join" and "De-Pivot" operators. Never fear, I've updated the process and it's here.

Here is the money shot to prove it still works

An interesting feature of this process is the way it uses the "De-Pivot" operator to transform a matrix-like example set into x,y,z coordinates that can be plotted.