Training Neural Networks On The MNIST Dataset

2025-02-17

MNIST classification

In this post, I will walk through my process of training neural networks on the MNIST dataset using both Stochastic Gradient Descent (SGD) and Genetic Algorithms (GA). The MNIST dataset, consisting of 70,000 grayscale images of handwritten digits, serves as an ideal starting point for exploring machine learning models. Each image is 28x28 pixels, providing a consistent input size of 784 features per image. I implemented three C++ programs to handle the loading, training, and testing phases efficiently, while focusing on the performance and suitability of different optimization techniques.

Loading MNIST data in C++

To start, I created a class for reading and processing the MNIST dataset in C++. This class reads both training and testing images and labels from the provided “idx” format files. The class constructor ensures that the dataset files exist and reads the binary data from them, while also verifying the file format using the magic number present in the dataset. Endian swapping is performed to handle potential byte-ordering issues. Once the raw pixel data and labels are read, I processed the pixel values into binary format for simplicity, where each non-zero pixel is converted to a value of 1, effectively creating a black-and-white representation of each image.

Here is a snippet of the code to handle endian swapping:


uint32_t nn::MNIST::SwapEndian(uint32_t val)
{
    uint32_t r = val;
    char* buf  = reinterpret_cast<char*>(&r);
    std::swap(buf[0], buf[3]);
    std::swap(buf[1], buf[2]);
    return r;
}

The images are stored in vectors, and the class provides methods to print these images in a simple ASCII format for visualization. Labels are stored in one-hot encoding, making them suitable for input to neural network models.

Training with stochastic gradient descent

The first program I wrote, network2_bin, implements a multilayer perceptron (MLP) trained using Stochastic Gradient Descent (SGD). This method updates the weights of the network incrementally after each mini-batch of images during training. I chose an architecture with 784 input neurons (matching the number of pixels per image), 30 hidden neurons, and 10 output neurons (representing the digits 0-9).

Training is controlled by specifying the number of epochs, the mini-batch size, and the learning rate (eta). The training process can be started from scratch or continued from a previously saved state by deserializing the model. After training is complete, the model can be serialized and saved for future use.


if (doTraining)
{
    nn1 = std::make_unique<nn::ANN_MLP_SGD<float>>(std::vector<size_t>{784, 30, 10});
    nn1->TrainSGD(images, labels, nEpochs, miniBatchSize, eta);
    nn1->Serialize(archiveFile);
}

Using SGD, my model achieved over 90% accuracy on the test data within just 5 epochs, highlighting the effectiveness of this approach for image classification tasks such as MNIST.

Genetic algorithms for neural networks

The third program, network3_bin, explores the use of a Genetic Algorithm (GA) for training the same MLP. GAs optimize the weights of the neural network through evolutionary processes, including selection, crossover, and mutation. However, GAs are not ideal for tasks like MNIST, where gradient-based methods like SGD outperform them significantly. Even after running for over 100,000 generations, the GA-trained network achieved only about 30% accuracy on the test data.

The results demonstrate that while GAs may be useful for other optimization tasks, they are not as effective for neural networks, particularly on structured data like MNIST. Here is a simplified view of how I applied the GA for this problem:


if (doTraining)
{
    nn1 = std::make_unique<nn::ANN_MLP_GA<float>>(std::vector<size_t>{784, 30, 10});
    nn1->TrainGA(images, labels, generations, populationSize, mutationRate);
    nn1->Serialize(archiveFile);
}

Conclusion

Through my work on training neural networks for the MNIST dataset, I found that while Genetic Algorithms offer an alternative approach, Stochastic Gradient Descent remains the most effective method for training on tasks like digit classification. The simplicity and efficiency of SGD make it well-suited for this type of image data, achieving high accuracy in a short period. For more insights into this topic, you can find the details here.

The code for these programs is available on Github here.

github cpp artificial-intelligence neural-networks

Building a feedforward neural network in C++: GA class Quantum behavior of two lasers beat note