MNIST classification is a essential task in machine learning, where neural networks are trained to recognize and categorize handwritten digits from the MNIST dataset. This dataset contains 70,000 images of digits (0-9) in grayscale, each sized 28x28 pixels. By using techniques such as convolutional neural networks (CNNs), models can learn to extract important features from the images, enabling accurate digit recognition. The MNIST classification problem is often a starting point for exploring neural network architectures, optimization techniques, and model evaluation, serving as a benchmark for testing the effectiveness of AI systems in image recognition tasks.
The classes used are described in details here.
There are three programs.
The program network1_bin
is designed to load and display sample MNIST images and their corresponding labels. It uses a simple logging system to either log to the console or to a file based on the bFileLog
flag.
I implemented a class for reading and processing MNIST datasets in C++. It efficiently handles both the training and testing images and labels from the standard MNIST dataset files, which include the “train-images-idx3-ubyte” and “train-labels-idx1-ubyte” for training data and “t10k-images-idx3-ubyte” and “t10k-labels-idx1-ubyte” for testing data. The class constructor ensures that the specified directory exists, and it checks for the presence of the required files. The code reads the binary MNIST data and validates the magic numbers in both image and label files to confirm the correct file format.
Endian swapping is performed to handle the byte-ordering issues when reading the integer values in the MNIST files. The constructor reads the number of items, rows, and columns for the images, ensuring they match the expected MNIST dimensions. Then, it iterates through the dataset, loading raw pixel data and binary label information into respective vectors. Each image’s pixel data is transformed into a binary format, where non-zero pixel values are converted to 1
to create a simplified black-and-white representation of the MNIST images. Labels are stored in one-hot encoding, where the index corresponding to the actual label is set to 1
.
Additionally, the class provides methods for swapping endianness, retrieving raw images and labels, and displaying the processed image and label information. For instance, the PrintImage
method outputs an ASCII representation of the image, while the PrintLabel
method outputs the one-hot encoded label vector. The ImageRaw
and LabelNumeric
methods provide access to the raw image and label data for further processing. This implementation ensures a clean and flexible interface for reading and interacting with MNIST data for various machine learning or neural network tasks.
Once the files are read, the program prints a specific image to the console and outputs the corresponding label, providing a visual and numeric understanding of the dataset.
The program network2_bin
is designed to handle both training and testing of a neural network model using the Stochastic Gradient Descent (SGD) algorithm on the MNIST dataset. It supports the following modes based on command-line arguments:
Command-line Argument Handling: - --training_start
: Starts training from scratch. - --training_continue
: Continues training from a previously saved state. - --testing
: Skips training and directly proceeds with testing the model.
Training and Testing:
nepoch
determines the number of epochs (iterations) for training.This program implements a stochastic gradient descent (SGD) neural network, specifically a multilayer perceptron (MLP), designed for training on and testing the MNIST dataset. The neural network is structured using the ANN_MLP_SGD
class, which represents a standard feedforward neural network trained with SGD. The program includes functionality to either start the training from scratch or resume from a previously saved state by deserializing a stored network model.
In the case of training, if the flag trainingFromStart
is set, a new network is initialized with a specific architecture (nnsize
or the alternative commented line which specifies layers 784, 30, 10
). The first number, 784
, represents the number of input neurons, which corresponds to the MNIST images that are 28x28 pixels (28 * 28 = 784). Each pixel is treated as an input feature to the neural network. The last number, 10
, corresponds to the number of output neurons, representing the 10 possible digits (0 through 9) that the network is tasked with classifying. The middle layer (30
in the commented version) is a hidden layer of neurons that allows the network to model more complex patterns within the data.
The network is then trained on the dataset using the TrainSGD
function, which takes in the images, labels, number of epochs (nEpochs
), mini-batch size (miniBatchSize
), and learning rate (eta
). After the training is completed, the network is serialized and saved to a file so it can be resumed later if needed. A log statement is printed to indicate that the training is completed.
In the testing mode (if doTraining
is false), the program initializes a test MNIST object and deserializes the previously trained network from the archiveFile
. It loads the test images and labels, and then runs the TestSGD
function to evaluate the network’s performance on the test data. The correct predictions are counted, and the accuracy is logged, showing how well the network performed.
The choice of 784
input neurons and 10
output neurons is specific to the MNIST dataset. Each image is a flattened 28x28 grid of pixels, which results in 784 input features. The network outputs a probability distribution across the 10 possible digits (0-9), with the highest probability corresponding to the predicted digit.
Performance: The program achieves a good accuracy with 5 epochs, showing more tha 90% accuracy.
The program network3_bin
uses a Genetic Algorithm (GA) for training a neural network on the MNIST dataset. Like the other programs, it accepts command-line arguments to control whether it should start training from scratch, continue training, or run in testing mode:
Command-line Argument Handling: - --training_start
: Starts training from scratch. - --training_continue
: Continues training from a previously saved state. - --testing
: Skips training and directly proceeds with testing the model.
Training and Testing:
The code is similar to the one of network2_bin
with the difference that the class used in no longer ANN_MLP_SGD
but the equivalent ANN_MLP_GA
is used instead.
Despite running even for more than 100,000 generations, the accuracy remains low, around 30%. This slow convergence is expected because Genetic Algorithms are not well-suited for optimizing neural networks on tasks like MNIST digit classification, where gradient-based methods such as SGD are more effective.
The program not only highlights the limitations of Genetic Algorithms for efficient convergence on tasks like MNIST training but also serves as a demonstration of how to use my library, which can be applied to other tasks more suited for GA or where SGD is not applicable, such as simulating interactions in a game.
The complete code is available on GitHub. The repository contains detailed instructions on how to set up the environment, compile, run the training and the tests and it is available at the following link.