You can download the process here.
It requires R and Keras to be installed - an exercise for the reader.
The main features of the process are
- R is used to get the MNIST data and create a training set and a test set. I use R to add the label to the data to make a single example set for the two cases rather than have separate structures for the data and the labels. This is a big strength of RapidMiner because all the data and labels are in one place.
- The data is restructured in R to change 3d tensors of shape (60000, 28, 28) to 2d tensors of shape (60000, 768). The 3d tensor represents the images each of size 28 by 28 pixels. RapidMiner example sets are 2d tensors but these are OK to feed into the Keras part of the process.
- The Keras part of the model has the following characteristics
- The input shape is (784,), this matches the number of columns in the 2d tensor.
- The loss parameter is set to "categorical_crossentropy" and the optimizer is set to "RMSprop".
- There are 2 layers in the Keras model. The first is "dense" with 512 units and activation set to "relu". The second is "dense" with 10 units and activation set to "softmax". The 10 in this case is the number of different possible values the label can be.
- The "validation_split" parameter is set to 0.1 so that a loss is calculated on a small part of the training data. This leads to validation loss results in the output which is used to see when over-fitting is happening.
Here is a screenshot of the history from a large run (this is output from the Keras model as an example set).
The training loss (in blue) decreases systematically as the model learns the training data more and more. The loss against the validation data (in red) shows worse performance as the number of epochs increases and the variation between epochs is evidence that perhaps I should use a larger training fraction. Nonetheless, only a small number of epochs would be enough to get a model that would perform well on unseen data.
The Keras model does not use convolution layers (an exercise for a later post) but despite this, it performs very well. Here is the confusion matrix using 3 epochs.
This is a very good result and shows the power of deep learning. It's gratifying that RapidMiner supports it.
As time permits, a future post will look at using convolution layers to see what improvements could be achieved. I may also do some systematic experiments to check how validation loss measured during training maps to actual loss on unseen data.