Short Definition
Feedforward Networks are neural networks in which information flows strictly from the input layer to the output layer without cycles or feedback loops. Each layer transforms the output of the previous layer through a weighted linear transformation followed by a nonlinear activation.
They are the most fundamental building block of deep learning systems.
Definition
A feedforward neural network computes a function by passing input data through a sequence of layers.
Given an input vector:
[
x
]
the network computes:
[
y = f_L(f_{L-1}(\dots f_1(x)))
]
Each layer performs the operation:
[
h^{(l)} = \sigma(W^{(l)}h^{(l-1)} + b^{(l)})
]
Where:
- (h^{(l)}) = hidden representation at layer (l)
- (W^{(l)}) = weight matrix
- (b^{(l)}) = bias vector
- (\sigma) = nonlinear activation function
The first layer receives the input vector, and the final layer produces the output prediction.
Core Concept
Feedforward networks propagate information in a single direction:
Input → Hidden Layer → Hidden Layer → Output
There are no loops or recurrent connections.
This means the network does not maintain memory of previous inputs.
Minimal Conceptual Illustration
Example network with two hidden layers:
x → Linear → ReLU → Linear → ReLU → Linear → y
Or visually:
Input Layer
↓
Hidden Layer
↓
Hidden Layer
↓
Output Layer
Each layer learns progressively more abstract representations.
Multilayer Perceptron (MLP)
The most common type of feedforward network is the Multilayer Perceptron (MLP).
Characteristics:
- fully connected layers
- nonlinear activations
- trained using gradient descent
MLPs are used for:
- classification
- regression
- representation learning
Universal Function Approximation
Feedforward networks have strong theoretical properties.
The Universal Approximation Theorem states that a neural network with one hidden layer and sufficient neurons can approximate any continuous function.
This makes feedforward networks powerful general-purpose models.
Activation Functions
Nonlinear activation functions allow networks to model complex relationships.
Common choices include:
- ReLU
- Sigmoid
- Tanh
- GELU
Without nonlinear activations, the network would collapse into a single linear transformation.
Training
Feedforward networks are trained using backpropagation.
A loss function measures prediction error:
[
\mathcal{L}(\theta)
]
Model parameters are updated using gradient descent:
[
\theta_{t+1} = \theta_t – \eta \nabla_\theta \mathcal{L}(\theta_t)
]
Where:
- (\theta) = parameters
- (\eta) = learning rate
Role in Modern Architectures
Although more complex architectures exist, feedforward networks remain core components in modern models.
Examples include:
- Transformer feedforward blocks
- classification heads in language models
- representation layers in vision models
In Transformers, the Feedforward Network (FFN) operates after the attention layer to transform token representations.
Advantages
Feedforward networks are:
- simple to implement
- computationally efficient
- theoretically well understood
- highly flexible
They serve as the foundation for many deep learning systems.
Limitations
Feedforward networks lack mechanisms for modeling sequential relationships.
They cannot naturally represent:
- temporal dependencies
- long-range context
- sequential memory
Architectures such as RNNs and Transformers address these limitations.
Summary
Feedforward networks are neural networks where information moves strictly from input to output through a series of transformations.
They form the foundational architecture of deep learning and remain essential components within modern models such as Transformers.
Related Concepts
- Multilayer Perceptron (MLP)
- Backpropagation
- Activation Functions
- Transformer Architecture
- Convolutional Neural Networks
- Universal Approximation Theorem