Building A Feedforward Neural Network In C++: SGD Class

Quantum
Quest
Algorithms, Math, and Physics

Building a feedforward neural network in C++: SGD class

In my extended implementation of the base class ANN_MLP, the ANN_MLP_SGD class focuses on training feedforward neural networks using stochastic gradient descent (SGD). The ANN_MLP class already provides the core structure to manage the neural network’s parameters such as weights, biases, and layers, while handling data persistence through serialization. However, the ANN_MLP_SGD class implements the logic for training the network using mini-batches and updating parameters based on computed gradients.

Class overview

The ANN_MLP_SGD class inherits the properties from the base class and adds specific methods for training and testing the network with SGD. Two key methods, TrainSGD and TestSGD, handle the training and evaluation process by taking the input data, reference outputs, and training parameters like epochs, mini-batch size, and learning rate.


void TrainSGD(const std::vector<std::vector<T>>& data,
              const std::vector<std::vector<T>>& reference,
              size_t epochs, size_t miniBatchSize, double eta);

int TestSGD(const std::vector<std::vector<T>>& data,
            const std::vector<std::vector<T>>& reference);

The TrainSGD method processes mini-batches of training data, updating the network’s weights and biases incrementally. The TestSGD method evaluates the trained model by comparing predicted outputs to reference outputs and counting the number of correct predictions.

Training the network

In the TrainSGD method, the stochastic gradient descent algorithm is used to divide the dataset into mini-batches, compute the gradients for each batch, and update the network’s parameters accordingly. The method begins by shuffling the training data at the start of each epoch, ensuring randomness in the training process.


for (size_t j = 0; j < s_.size() - miniBatchSize; j += miniBatchSize)
{
    // Zero the gradients for each mini-batch
    for (size_t l = 0; l < nLayers - 1; ++l)
    {
        nb_[l].Zeros();
        nw_[l].Zeros();
    }

    // Forward propagation for each sample in the mini-batch
    for (size_t k = 0; k < miniBatchSize; ++k)
    {
        na_[0].assign(data[*it_]);
        for (size_t j = 1; j < nLayers; ++j)
        {
            la::MatMultVec(nzv_[j - 1], vWeights[0][j - 1], na_[j - 1]);
            nzv_[j - 1] += vBiases[0][j - 1];
            nn::ActFunc(na_[j], nzv_[j - 1], pAct);
        }
    }
}

The method performs forward propagation for each sample in the mini-batch by computing the weighted sums of inputs and applying the selected activation function (either sigmoid or tanh). These activations are then passed through the network’s layers.

Once forward propagation is complete, the backpropagation process begins. The error at the output layer is computed, followed by the calculation of gradients for weights and biases through backward passes.

Backpropagation

Backpropagation is the key to computing the gradients needed to update the network’s weights and biases. The process starts with calculating the delta at the output layer and propagating this error backward through the hidden layers.


// Compute delta for the output layer
dno2_[osize] = na_[osize + 1];
dno2_[osize] -= reference[*it_];
nn::ActFunc(nzv_[osize], pDAct);
MatHadamard(dno_[osize], dno2_[osize], nzv_[osize]);

// Update weights and biases for the output layer
for (size_t l = nLayers - 2; l > 0; --l)
{
    nn::ActFunc(nzv_[l - 1], pDAct);
    la::MatMultVec(dno2_[l - 1], wt_[l], dno_[l]);
    MatHadamard(dno_[l - 1], dno2_[l - 1], nzv_[l - 1]);
    dnb_[l - 1] = dno_[l - 1];
    MatOuter(dnw_[l - 1], dno_[l - 1], na_[l - 1]);
}

The gradients for each layer are stored, and the accumulated gradients are used to update the weights and biases after each mini-batch. The learning rate (eta) controls how large these updates are, ensuring the network converges over time.

Testing the Network

After training, the TestSGD method evaluates the network’s performance on test data. The method runs each test sample through the network, compares the predicted output with the reference output, and counts the number of correct predictions.


int nn::ANN_MLP_SGD<T>::TestSGD(const std::vector<std::vector<T>>& data,
                                const std::vector<std::vector<T>>& reference)
{
    int iCorrect = 0;
    for (size_t i = 1; i < data.size(); ++i)
    {
        for (size_t l = 0; l < na_[0].GetRowsNb(); ++l) na_[0][l][0] = data[i][l];
        for (size_t j = 1; j < nLayers; ++j)
        {
            la::MatMultVec(na_[j], vWeights[0][j - 1], na_[j - 1]);
            na_[j] += vBiases[0][j - 1];
            nn::ActFunc(na_[j], pAct);
        }
        const ptrdiff_t maxPos = std::distance(res.begin(), std::max_element(res.begin(), res.end()));
        const ptrdiff_t refPos = std::distance(reference[i].begin(),
                                               std::max_element(reference[i].begin(), reference[i].end()));
        if (maxPos == refPos) iCorrect++;
    }
    return iCorrect;
}

By counting the correct predictions, this method provides an overall accuracy score for the trained network, helping assess its effectiveness on unseen data.

Conclusion

The ANN_MLP_SGD class extends the base class to provide a full implementation of stochastic gradient descent for training and testing feedforward neural networks. Through mini-batch updates and backpropagation, the network learns from data and adapts its parameters to minimize the error over time. For more insights into this topic, you can find the details here.

The code for this implementation is available on Github here.