Building a feedforward neural network in C++: GA class
In my implementation of feedforward neural networks (FNNs) using genetic algorithms (GA), the ANN_MLP_GA
class is derived from the base ANN_MLP
class. While the base class handles the network’s structure, weights, biases, and activation functions, the ANN_MLP_GA
class adds the necessary components to manage a population of networks, which is essential for GA-based optimization.
Unlike the traditional stochastic gradient descent (SGD) method, where a single network is trained through iterative weight updates, genetic algorithms work by evolving a population of networks. Each population member has its own set of weights and biases, which evolve over multiple generations to optimize performance.
Class overview
The ANN_MLP_GA
class manages this population, maintaining vectors for population weights (vWeightsPop
) and biases (vBiasesPop
). Since GA requires multiple copies of the network for each population member, memory management becomes a crucial consideration. Each member of the population needs distinct sets of weights and biases, increasing the memory usage significantly compared to the SGD method.
The ANN_MLP_GA
class includes methods like CreatePopulation
and AllocatePopulation
to initialize and manage this population. These methods can either generate a fixed population or create a mixed population, with an option to retain the top performers of the previous generation. The UpdateWeightsAndBiases
method is responsible for transferring the weights and biases from a population member to the main network for feedforward computations.
void UpdateWeightsAndBiases(const std::vector<size_t>& v_);
void TrainGA(const std::vector<std::vector<T>>& data,
const std::vector<std::vector<T>>& reference,
size_t nGenerations, size_t BatchSize, bool shuffleTrainingData = true);
int TestGA(const std::vector<std::vector<T>>& data,
const std::vector<std::vector<T>>& reference);
Genetic algorithm training
In the genetic algorithm approach, the TrainGA
method is the core function that evolves the network population over generations. Each generation undergoes testing, and top performers are selected. Crossover and mutation are applied to evolve the population.
The CreatePopulationFixed
method allows for fixed linear combinations of top performers, while the CreatePopulationMixed
method randomly selects two top performers and creates a new population member by combining their weights and biases. This enables the algorithm to explore the search space by introducing variations while retaining the best solutions.
for (size_t i = 0; i < nGenerations; ++i)
{
std::iota(s_.begin(), s_.end(), 0);
if (shuffleTrainingData && !(i % 10))
{
std::cout << "Reshuffling training data\n";
std::shuffle(s_.begin(), s_.end(), g_);
}
CreatePopulation();
std::fill(f_.begin(), f_.end(), 0);
for (size_t j = 0; j < std::min(s_.size(), BatchSize); ++j)
{
it_ = s_.begin() + (ptrdiff_t)j;
na_[0].assign(data[*it_]);
for (size_t k = 0; k < nPopSize; ++k)
{
vNaPop[k][0].assign(data[*it_]);
for (size_t l = 1; l < nLayers; ++l)
{
la::MatMultVec(vNaPop[k][l], vWeightsPop[k][l - 1], vNaPop[k][l - 1]);
vNaPop[k][l] += vBiasesPop[k][l - 1];
nn::ActFunc(vNaPop[k][l], pAct);
}
}
}
}
Each population member is tested against the dataset in mini-batches. The fitness of each network is updated based on its classification performance, and the population is sorted by fitness. Top performers are used to update the primary network parameters for the next generation.
Population creation
The CreatePopulation
method can initialize the population in two ways: fixed linear combinations or mixed combinations of top performers. The CreatePopulationFixed
method averages the weights and biases of two top performers to create new population members, while the CreatePopulationMixed
method selects weights randomly from two top individuals. Both methods introduce mutation to the newly created members, which ensures diversity in the population and prevents premature convergence.
template <typename T>
void nn::ANN_MLP_GA<T>::CreatePopulationMixed(bool bKeepPrevious)
{
size_t a_ = 0, b_ = 0, count_ = 0;
if (bKeepPrevious)
for (size_t i = 0; i < nTop; ++i)
{
for (size_t k = 0; k < nLayers - 1; ++k)
{
vBiasesPop[i][k].assign(vBiases[i][k].data());
vWeightsPop[i][k].assign(vWeights[i][k].data());
}
count_++;
}
for (size_t i = count_; i < nPopSize; ++i)
{
a_ = static_cast<size_t>(GetRandomUniformInt() % static_cast<int>(nTop));
b_ = static_cast<size_t>(GetRandomUniformInt() % static_cast<int>(nTop));
while (a_ == b_)
b_ = static_cast<size_t>(GetRandomUniformInt() % static_cast<int>(nTop));
for (size_t j = 0; j < nLayers - 1; ++j)
{
const size_t p = (GetRandomUniformReal() < 0.5) ? a_ : b_;
vBiasesPop[i][j].assign(vBiases[p][j].data());
vBiasesPop[i][j] *= static_cast<T>(0.9);
vWeightsPop[i][j].assign(vWeights[p][j].data());
vWeightsPop[i][j] *= static_cast<T>(0.9);
}
}
}
Testing
The TestGA
method evaluates the population by running the feedforward process for each test input, comparing the predicted output with the reference output, and calculating the number of correct predictions. Similar to SGD testing, this method provides an accuracy measure for the GA-trained network.
int nn::ANN_MLP_GA<T>::TestGA(const std::vector<std::vector<T>>& data,
const std::vector<std::vector<T>>& reference)
{
int iCorrect = 0;
for (size_t i = 1; i < data.size(); ++i)
{
for (size_t l = 0; l < na_[0].GetRowsNb(); ++l) na_[0][l][0] = data[i][l];
for (size_t j = 1; j < nLayers; ++j)
{
la::MatMultVec(na_[j], vWeights[0][j - 1], na_[j - 1]);
na_[j] += vBiases[0][j - 1];
nn::ActFunc(na_[j], pAct);
}
const ptrdiff_t maxPos = std::distance(res.begin(), std::max_element(res.begin(), res.end()));
const ptrdiff_t refPos = std::distance(reference[i].begin(), std::max_element(reference[i].begin(), reference[i].end()));
if (maxPos == refPos) iCorrect++;
}
return iCorrect;
}
Conclusion
The ANN_MLP_GA
class leverages genetic algorithms to optimize a population of feedforward neural networks, allowing exploration of a larger solution space compared to traditional gradient-based methods. For more insights into this topic, you can find the details here.
The code for this implementation is available on Github here.