Building a feedforward neural network in C++: SGD class
In my extended implementation of the base class ANN_MLP
, the ANN_MLP_SGD
class focuses on training feedforward neural networks using stochastic gradient descent (SGD). The ANN_MLP
class already provides the core structure to manage the neural network’s parameters such as weights, biases, and layers, while handling data persistence through serialization. However, the ANN_MLP_SGD
class implements the logic for training the network using mini-batches and updating parameters based on computed gradients.
Class overview
The ANN_MLP_SGD
class inherits the properties from the base class and adds specific methods for training and testing the network with SGD. Two key methods, TrainSGD
and TestSGD
, handle the training and evaluation process by taking the input data, reference outputs, and training parameters like epochs, mini-batch size, and learning rate.
void TrainSGD(const std::vector<std::vector<T>>& data,
const std::vector<std::vector<T>>& reference,
size_t epochs, size_t miniBatchSize, double eta);
int TestSGD(const std::vector<std::vector<T>>& data,
const std::vector<std::vector<T>>& reference);
The TrainSGD
method processes mini-batches of training data, updating the network’s weights and biases incrementally. The TestSGD
method evaluates the trained model by comparing predicted outputs to reference outputs and counting the number of correct predictions.
Training the network
In the TrainSGD
method, the stochastic gradient descent algorithm is used to divide the dataset into mini-batches, compute the gradients for each batch, and update the network’s parameters accordingly. The method begins by shuffling the training data at the start of each epoch, ensuring randomness in the training process.
for (size_t j = 0; j < s_.size() - miniBatchSize; j += miniBatchSize)
{
// Zero the gradients for each mini-batch
for (size_t l = 0; l < nLayers - 1; ++l)
{
nb_[l].Zeros();
nw_[l].Zeros();
}
// Forward propagation for each sample in the mini-batch
for (size_t k = 0; k < miniBatchSize; ++k)
{
na_[0].assign(data[*it_]);
for (size_t j = 1; j < nLayers; ++j)
{
la::MatMultVec(nzv_[j - 1], vWeights[0][j - 1], na_[j - 1]);
nzv_[j - 1] += vBiases[0][j - 1];
nn::ActFunc(na_[j], nzv_[j - 1], pAct);
}
}
}
The method performs forward propagation for each sample in the mini-batch by computing the weighted sums of inputs and applying the selected activation function (either sigmoid
or tanh
). These activations are then passed through the network’s layers.
Once forward propagation is complete, the backpropagation process begins. The error at the output layer is computed, followed by the calculation of gradients for weights and biases through backward passes.
Backpropagation
Backpropagation is the key to computing the gradients needed to update the network’s weights and biases. The process starts with calculating the delta at the output layer and propagating this error backward through the hidden layers.
// Compute delta for the output layer
dno2_[osize] = na_[osize + 1];
dno2_[osize] -= reference[*it_];
nn::ActFunc(nzv_[osize], pDAct);
MatHadamard(dno_[osize], dno2_[osize], nzv_[osize]);
// Update weights and biases for the output layer
for (size_t l = nLayers - 2; l > 0; --l)
{
nn::ActFunc(nzv_[l - 1], pDAct);
la::MatMultVec(dno2_[l - 1], wt_[l], dno_[l]);
MatHadamard(dno_[l - 1], dno2_[l - 1], nzv_[l - 1]);
dnb_[l - 1] = dno_[l - 1];
MatOuter(dnw_[l - 1], dno_[l - 1], na_[l - 1]);
}
The gradients for each layer are stored, and the accumulated gradients are used to update the weights and biases after each mini-batch. The learning rate (eta
) controls how large these updates are, ensuring the network converges over time.
Testing the Network
After training, the TestSGD
method evaluates the network’s performance on test data. The method runs each test sample through the network, compares the predicted output with the reference output, and counts the number of correct predictions.
int nn::ANN_MLP_SGD<T>::TestSGD(const std::vector<std::vector<T>>& data,
const std::vector<std::vector<T>>& reference)
{
int iCorrect = 0;
for (size_t i = 1; i < data.size(); ++i)
{
for (size_t l = 0; l < na_[0].GetRowsNb(); ++l) na_[0][l][0] = data[i][l];
for (size_t j = 1; j < nLayers; ++j)
{
la::MatMultVec(na_[j], vWeights[0][j - 1], na_[j - 1]);
na_[j] += vBiases[0][j - 1];
nn::ActFunc(na_[j], pAct);
}
const ptrdiff_t maxPos = std::distance(res.begin(), std::max_element(res.begin(), res.end()));
const ptrdiff_t refPos = std::distance(reference[i].begin(),
std::max_element(reference[i].begin(), reference[i].end()));
if (maxPos == refPos) iCorrect++;
}
return iCorrect;
}
By counting the correct predictions, this method provides an overall accuracy score for the trained network, helping assess its effectiveness on unseen data.
Conclusion
The ANN_MLP_SGD
class extends the base class to provide a full implementation of stochastic gradient descent for training and testing feedforward neural networks. Through mini-batch updates and backpropagation, the network learns from data and adapts its parameters to minimize the error over time. For more insights into this topic, you can find the details here.
The code for this implementation is available on Github here.