We have a data set of handwritten digits (MNIST) and our aim is to build a classifier to identify which digit the image represents. In technical terms, we have to design a classifier with 10 classes representing the digit. We will use three strategies to solve the same problem:
Data was obtained from the following website:
http://yann.lecun.com/exdb/mnist/index.html
Each digit is represented as a 28x28 dimensional image or 784 pixels. Each image is in grayscale format. Hence a typical “feature vector” we will draw out from this image will be 784-dimensional
We will perform handwritten digit classification designing a Bayes Optimal classifier using a Gaussian generative model on MNIST Dataset. We will model each class(representing each digit as a multivariate (784-dimensional) Gaussian. The data set of handwritten digits was obtained from: This is called generative parametric modeling as we model the classes and NOT the class boundaries.
In order to gain more insights as to how the Bayesian classifier works we could do two visualizations.
The latter plot will sort of show interesting insights: The classifier predicts a lot of “5”’s and “3”’s as “8”’s. One thing we could infer from this analysis is maybe the lower half loop in 3 and 5, might be perceived as a lower half loop in “9”. It was also observed that as a lot of people do not write the lower half loop of “9”, A lot of cases when the true class was “1” or “7” was often predicted as a “9”. The max misclassifications occur for the digits “5”, followed by a few for “7”. This Bayes Classifier had an error value 0.16 (85% accuracy) on the test set. This was implemented in Python.
We define the softmax function as follows: We will use iterative gradient descent to solve for the weight matrices
This is a sophisticated approach which gives us the luxury to tune clasifier for optimal hyperparameters. Depending upon the accuracy we need we can change the break condition.
However like most gradient descent algorithms, it takes ages to reach high levels of accuracy. The initial learning curve is high and reaches 70-75% in a few iterations. However every successive percent increase takes more number of epochs. I was able to achieve around 80% in 1000 epochs. This SGD algorithm was implemented in MATLAB.
Here we basically train a feedforward neural network using Backpropagation to determine the weights to predict the classes for the given input. Since input is of 784 length, our input layer will have 784 nodes. Output layer will have 10 nodes (representing each class). We will then create a densely connected feed forward neural net between these two layers. We have the luxury to choose the intermediate hidden architecture. We can vary number of hidden layers/ number of nodes in each layer etc.
Results: Achieved a test accuracy of 96.08%