Marco Azimonti Personal Homepage

MNIST Classification

Introduction

MNIST classification is a essential task in machine learning, where neural networks are trained to recognize and categorize handwritten digits from the MNIST dataset. This dataset contains 70,000 images of digits (0-9) in grayscale, each sized 28x28 pixels. By using techniques such as convolutional neural networks (CNNs), models can learn to extract important features from the images, enabling accurate digit recognition. The MNIST classification problem is often a starting point for exploring neural network architectures, optimization techniques, and model evaluation, serving as a benchmark for testing the effectiveness of AI systems in image recognition tasks.

The classes used are described in details here.

There are three programs.

Program 1

The program network1_bin is designed to load and display sample MNIST images and their corresponding labels. It uses a simple logging system to either log to the console or to a file based on the bFileLog flag.

I implemented a class for reading and processing MNIST datasets in C++. It efficiently handles both the training and testing images and labels from the standard MNIST dataset files, which include the “train-images-idx3-ubyte” and “train-labels-idx1-ubyte” for training data and “t10k-images-idx3-ubyte” and “t10k-labels-idx1-ubyte” for testing data. The class constructor ensures that the specified directory exists, and it checks for the presence of the required files. The code reads the binary MNIST data and validates the magic numbers in both image and label files to confirm the correct file format.

Endian swapping is performed to handle the byte-ordering issues when reading the integer values in the MNIST files. The constructor reads the number of items, rows, and columns for the images, ensuring they match the expected MNIST dimensions. Then, it iterates through the dataset, loading raw pixel data and binary label information into respective vectors. Each image’s pixel data is transformed into a binary format, where non-zero pixel values are converted to 1 to create a simplified black-and-white representation of the MNIST images. Labels are stored in one-hot encoding, where the index corresponding to the actual label is set to 1.

Additionally, the class provides methods for swapping endianness, retrieving raw images and labels, and displaying the processed image and label information. For instance, the PrintImage method outputs an ASCII representation of the image, while the PrintLabel method outputs the one-hot encoded label vector. The ImageRaw and LabelNumeric methods provide access to the raw image and label data for further processing. This implementation ensures a clean and flexible interface for reading and interacting with MNIST data for various machine learning or neural network tasks.


nn::MNIST::MNIST(const std::string dirname, bool isTraining, bool bVerbose)
    : sTrainingImages(dirname + "/train-images-idx3-ubyte"), sTrainingLabels(dirname + "/train-labels-idx1-ubyte"),
      sTestingImages(dirname + "/t10k-images-idx3-ubyte"), sTestingLabels(dirname + "/t10k-labels-idx1-ubyte")
{
    if (!fs::exists(dirname))
        throw std::runtime_error(std::string("Directory: ").append(dirname).append(" not found. Exiting..."));

    if (isTraining)
    {
        ifImages.open(sTrainingImages, std::ios::in | std::ios::binary);
        ifLabels.open(sTrainingLabels, std::ios::in | std::ios::binary);
    }
    else
    {
        ifImages.open(sTestingImages, std::ios::in | std::ios::binary);
        ifLabels.open(sTestingLabels, std::ios::in | std::ios::binary);
    }

    // Read the magic and the meta data
    uint32_t magic_;
    uint32_t nLabels_;
    uint32_t rows_;
    uint32_t cols_;

    ifImages.read(reinterpret_cast<char*>(&magic_), 4);
    magic_ = SwapEndian(magic_);
    if (magic_ != 2051)
        throw std::runtime_error(
            std::string("Incorrect images file magic: ").append(std::to_string(magic_)).append(". Exiting..."));

    ifLabels.read(reinterpret_cast<char*>(&magic_), 4);
    magic_ = SwapEndian(magic_);
    if (magic_ != 2049)
        throw std::runtime_error(
            std::string("Incorrect labels file magic: ").append(std::to_string(magic_)).append(". Exiting..."));

    ifImages.read(reinterpret_cast<char*>(&nItems), 4);
    nItems = SwapEndian(nItems);
    ifLabels.read(reinterpret_cast<char*>(&nLabels_), 4);
    nLabels_ = SwapEndian(nLabels_);

    ifImages.read(reinterpret_cast<char*>(&rows_), 4);
    rows_ = SwapEndian(rows_);
    ifImages.read(reinterpret_cast<char*>(&cols_), 4);
    cols_ = SwapEndian(cols_);

    assert((nItems == 60000) || (nItems == 10000));
    assert((nLabels_ == 60000) || (nLabels_ == 10000));
    assert(nItems == nLabels_);
    assert(rows_ == DIM1);
    assert(cols_ == DIM2);

    if (bVerbose)
    {
        std::cout << "Image and label num is: " << nItems << std::endl;
        std::cout << "Image rows: " << DIM1 << ", cols: " << DIM2 << std::endl;
    }

    // read the images
    vLabelsRaw.resize(nItems);
    for (size_t i = 0; i < nItems; ++i)
    {
        vImagesRaw.emplace_back(std::vector<char>(DIMS));
        vImages.emplace_back(std::vector<size_t>(DIMS));
        vLabels.emplace_back(std::vector<size_t>(10, 0));
    }

    for (size_t i = 0; i < nItems; ++i)
    {
        // read image pixel
        ifImages.read(&vImagesRaw[i][0], DIMS);
        for (size_t k = 0; k < DIM2; ++k)
            for (size_t j = 0; j < DIM1; ++j)
                vImages[i][k * DIM1 + j] = (vImagesRaw[i][k * DIM1 + j] == 0) ? 0 : 1;
        // read label
        ifLabels.read(&vLabelsRaw[i], 1);
        vLabels[i][size_t(vLabelsRaw[i])] = 1;
    }
}

uint32_t nn::MNIST::SwapEndian(uint32_t val)
{
    uint32_t r = val;
    char* buf  = reinterpret_cast<char*>(&r);
    std::swap(buf[0], buf[3]);
    std::swap(buf[1], buf[2]);
    return r;
}

std::string nn::MNIST::ImageRaw(const size_t n)
{
    std::string ret{};
    for (size_t i = 0; i < DIMS; ++i) ret.append(std::string(1, vImagesRaw[n][i]));
    return ret;
}

size_t nn::MNIST::LabelNumeric(const size_t n)
{
    return static_cast<size_t>(vLabelsRaw[n]);
}

std::string nn::MNIST::Label(const size_t n)
{
    std::string ret;
    ret = vLabelsRaw[n];
    return ret;
}

void nn::MNIST::PrintImage(size_t n)
{
    for (size_t j = 0; j < DIM2; ++j)
    {
        for (size_t i = 0; i < DIM1; ++i) { std::cout << vImages[n][j * DIM1 + i]; }
        std::cout << std::endl;
    }
}

void nn::MNIST::PrintLabel(size_t n)
{
    const auto& vec   = vLabels[n];
    const auto printN = std::min((size_t)10, vec.size());
    for (size_t i = 0; i < printN; ++i) { std::cout << vec[i]; }
    std::cout << std::endl;
}

Once the files are read, the program prints a specific image to the console and outputs the corresponding label, providing a visual and numeric understanding of the dataset.


Image and label num is: 10000
Image rows: 28, cols: 28
Sample : 36 - Results is: 7
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000001111111111111100000000
0000011111111111111110000000
0000011111110000011110000000
0001111111100000001110000000
0001111100000000011110000000
0001110000000000011100000000
0000000000000000111100000000
0000000000000000111000000000
0000000000000001111000000000
0000000000000001111000000000
0000000000000011100000000000
0000000000000111111110000000
0000000000001111111110000000
0000000000001111111110000000
0000000000001111100000000000
0000000000000111000000000000
0000000000000111000000000000
0000000000000111000000000000
0000000000000111000000000000
0000000000000111000000000000
0000000000000000000000000000
0000000100

Program 2

The program network2_bin is designed to handle both training and testing of a neural network model using the Stochastic Gradient Descent (SGD) algorithm on the MNIST dataset. It supports the following modes based on command-line arguments:

Command-line Argument Handling: - --training_start: Starts training from scratch. - --training_continue: Continues training from a previously saved state. - --testing: Skips training and directly proceeds with testing the model.

Training and Testing:

The network trains using the SGD algorithm, where nepoch determines the number of epochs (iterations) for training.
After training, it can be used for testing on the MNIST dataset.

This program implements a stochastic gradient descent (SGD) neural network, specifically a multilayer perceptron (MLP), designed for training on and testing the MNIST dataset. The neural network is structured using the ANN_MLP_SGD class, which represents a standard feedforward neural network trained with SGD. The program includes functionality to either start the training from scratch or resume from a previously saved state by deserializing a stored network model.

In the case of training, if the flag trainingFromStart is set, a new network is initialized with a specific architecture (nnsize or the alternative commented line which specifies layers 784, 30, 10). The first number, 784, represents the number of input neurons, which corresponds to the MNIST images that are 28x28 pixels (28 * 28 = 784). Each pixel is treated as an input feature to the neural network. The last number, 10, corresponds to the number of output neurons, representing the 10 possible digits (0 through 9) that the network is tasked with classifying. The middle layer (30 in the commented version) is a hidden layer of neurons that allows the network to model more complex patterns within the data.

The network is then trained on the dataset using the TrainSGD function, which takes in the images, labels, number of epochs (nEpochs), mini-batch size (miniBatchSize), and learning rate (eta). After the training is completed, the network is serialized and saved to a file so it can be resumed later if needed. A log statement is printed to indicate that the training is completed.

In the testing mode (if doTraining is false), the program initializes a test MNIST object and deserializes the previously trained network from the archiveFile. It loads the test images and labels, and then runs the TestSGD function to evaluate the network’s performance on the test data. The correct predictions are counted, and the accuracy is logged, showing how well the network performed.

The choice of 784 input neurons and 10 output neurons is specific to the MNIST dataset. Each image is a flattened 28x28 grid of pixels, which results in 784 input features. The network outputs a probability distribution across the 10 possible digits (0-9), with the highest probability corresponding to the predicted digit.


if (doTraining)
{
    if (trainingFromStart)
    {
        // ...
        nn1 = std::make_unique<nn::ANN_MLP_SGD<float>>(nnsize);
        // nn1 = std::make_unique<nn::ANN_MLP_SGD<float>>(std::vector<size_t>{784, 30, 10});
        nn1->SetName(nname);
    }
    else
    {
        nn1 = std::make_unique<nn::ANN_MLP_SGD<float>>();
        nn1->SetName(nname);
        nn1->Deserialize(archiveFile);
    }
    // read parameters: eta, nEpochs, miniBatchSize
    // ...
    // read images and labels
    for (const auto& img : imgTrain.Images())
    // ...
    for (const auto& lbl : imgTrain.Labels())
    // train the NN
    nn1->TrainSGD(images, labels, nEpochs, miniBatchSize, eta);
    // ...
    nn1->Serialize(archiveFile);
    LOGGER(logging::INFO) << std::string("*** Training completed");
}
else
{
    nn::MNIST imgTest = nn::MNIST(IMAGESDIR, false, false);
    nn::ANN_MLP_SGD<float> nn2;
    nn2.SetName(nname);
    nn2.Deserialize(archiveFile);
    std::vector<std::vector<float>> images;
    std::vector<std::vector<float>> labels;
    // Load the images and the labels
    int correct    = nn2.TestSGD(images, labels);
    const int size = (int)imgTest.Images().size();
    LOGGER(logging::INFO) << (std::string("*** Correct: ") + std::to_string(correct) + std::string(" / ") +
                              std::to_string(size) + " (" +
                              std::to_string(100.0 * static_cast<double>(correct) / size) + " %) ***");
}

Performance: The program achieves a good accuracy with 5 epochs, showing more tha 90% accuracy.

Program 3

The program network3_bin uses a Genetic Algorithm (GA) for training a neural network on the MNIST dataset. Like the other programs, it accepts command-line arguments to control whether it should start training from scratch, continue training, or run in testing mode:

Training and Testing:

This program replaces the Stochastic Gradient Descent (SGD) method with a Genetic Algorithm for optimizing the neural network’s weights.
GA Characteristics: Evolutionary processes such as selection, crossover, and mutation are applied across generations to find an optimal set of weights for the network.

The code is similar to the one of network2_bin with the difference that the class used in no longer ANN_MLP_SGD but the equivalent ANN_MLP_GA is used instead.

Despite running even for more than 100,000 generations, the accuracy remains low, around 30%. This slow convergence is expected because Genetic Algorithms are not well-suited for optimizing neural networks on tasks like MNIST digit classification, where gradient-based methods such as SGD are more effective.

The program not only highlights the limitations of Genetic Algorithms for efficient convergence on tasks like MNIST training but also serves as a demonstration of how to use my library, which can be applied to other tasks more suited for GA or where SGD is not applicable, such as simulating interactions in a game.

Complete code

The complete code is available on GitHub. The repository contains detailed instructions on how to set up the environment, compile, run the training and the tests and it is available at the following link.

Go to the top of the page