Feedforward neural networks (FNNs) details
In my latest exploration of feedforward neural networks (FNNs), I walk through the fundamentals of this architecture, a powerful, memoryless tool where information flows in a strictly unidirectional manner. Unlike recurrent neural networks (RNNs), which use feedback loops, FNNs pass information forward without revisiting any layers. This simplicity allows FNNs to be efficient for a wide range of computational tasks, particularly when dealing with static data, like classification and regression.
Feedforward Neural Networks (FNNs) are the simplest type of artificial neural networks where the information moves in only one direction—forward—from the input nodes, through the hidden nodes (if any), and finally to the output nodes. There are no cycles or loops in the network; the output of any layer does not affect the same layer or the preceding layers. This straightforward flow of data makes FNNs easier to understand and implement compared to other neural network architectures.
An FNN typically consists of an input layer, one or more hidden layers, and an output layer. Each layer is composed of neurons (also known as nodes or units), and each neuron in one layer is connected to every neuron in the next layer through weighted connections. The neurons process the input they receive by applying a weighted sum followed by a non-linear activation function, such as the sigmoid or ReLU (Rectified Linear Unit) function. The activation function introduces non-linearity into the network, enabling it to learn complex patterns in the data.
During the training phase, the network adjusts its weights based on the difference between the predicted output and the actual output using a method called backpropagation coupled with an optimization algorithm like gradient descent. The goal is to minimize a loss function that quantifies the error in the network’s predictions. Despite their simplicity, FNNs are powerful tools for solving problems like classification, regression, and pattern recognition when the data relationships are straightforward.
Network structure
The structure of an FNN consists of three main layers: the input layer, hidden layers, and output layer. Each neuron in the network is fully connected to neurons in adjacent layers. The input layer receives raw data, which is passed sequentially through the hidden layers, where transformations occur via weighted connections. The final output is produced in the output layer, whether it be a classification label or a continuous regression value.
Activation functions
To introduce non-linearity, activation functions are applied to neurons in the hidden and output layers. Without these functions, FNNs would only learn linear relationships, severely limiting their ability to model complex data. I use several activation functions depending on the task.
The sigmoid function is defined as:
f(x) = \frac{1}{1 + e^{-x}}
It is particularly useful in binary classification tasks as it constrains outputs to a range between 0 and 1. However, for more complex networks, the vanishing gradient problem makes sigmoid less ideal.
The tanh function, given by:
f(x) = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
maps inputs to a range between -1 and 1, providing stronger gradients than the sigmoid function, which can improve training in some cases.
A commonly used function in modern architectures is ReLU (Rectified Linear Unit), which is defined as:
f(x) = \max(0, x)
ReLU sets negative inputs to zero while leaving positive values unchanged, preventing vanishing gradients and making training faster and more efficient.
For multi-class classification tasks, it is possible to use the softmax function, expressed as:
f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}
This function outputs probabilities for each class, with the sum of the probabilities equal to 1, making it ideal for classification tasks with multiple categories.
Training process
Training a feedforward neural network involves adjusting weights and biases to minimize the error between predicted outputs and actual target values. I use backpropagation to compute the gradients of the loss function with respect to the network’s weights, applying the chain rule. The loss function itself depends on the task: Mean Squared Error (MSE
) for regression, and Cross-Entropy Loss for classification. The Mean Squared Error is calculated as:
MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2
where y_i is the actual target value and \hat{y_i} is the predicted output.
Once the gradients are calculated, I apply an optimization algorithm such as Stochastic Gradient Descent (SGD) or Adam to update the weights. The general weight update rule in gradient descent is given by:
w = w - \eta \cdot \nabla L
where w is the weight being updated, \eta is the learning rate, and \nabla L is the gradient of the loss function with respect to the weight. This process repeats over multiple epochs until the network’s performance converges.
Alternative training with genetic algorithms
In addition to gradient-based methods, I occasionally explore genetic algorithms (GAs) for optimizing FNNs. GAs do not rely on gradients but instead evolve the network’s parameters by treating them as individuals in a population. Over several generations, better-performing networks are selected for crossover and mutation, gradually improving the network’s performance.
This approach is particularly useful when the error surface is non-linear or contains multiple local minima. While slower than backpropagation, GAs offer a valuable alternative in some cases, though they are generally more computationally expensive.
Conclusion
Feedforward neural networks remain an essential architecture for handling tasks where static data is present. From my experience, their simplicity and efficiency make them suitable for classification and regression tasks. By carefully selecting the right activation functions and optimization techniques, I can tailor these networks to specific problems in computational tasks. While gradient descent remains the most common optimization approach, genetic algorithms provide an interesting, though computationally heavy, alternative for training FNNs in certain challenging cases.
For more insights into this topic, you can find the details here