Feedforward Networks

Short Definition

Feedforward Networks are neural networks in which information flows strictly from the input layer to the output layer without cycles or feedback loops. Each layer transforms the output of the previous layer through a weighted linear transformation followed by a nonlinear activation.

They are the most fundamental building block of deep learning systems.

Definition

A feedforward neural network computes a function by passing input data through a sequence of layers.

Given an input vector:

[
x
]

the network computes:

[
y = f_L(f_{L-1}(\dots f_1(x)))
]

Each layer performs the operation:

[
h^{(l)} = \sigma(W^{(l)}h^{(l-1)} + b^{(l)})
]

Where:

(h^{(l)}) = hidden representation at layer (l)
(W^{(l)}) = weight matrix
(b^{(l)}) = bias vector
(\sigma) = nonlinear activation function

The first layer receives the input vector, and the final layer produces the output prediction.

Core Concept

Feedforward networks propagate information in a single direction:

Input → Hidden Layer → Hidden Layer → Output

There are no loops or recurrent connections.

This means the network does not maintain memory of previous inputs.

Minimal Conceptual Illustration

Example network with two hidden layers:

x → Linear → ReLU → Linear → ReLU → Linear → y

Or visually:

Input Layer
↓
Hidden Layer
↓
Hidden Layer
↓
Output Layer

Each layer learns progressively more abstract representations.

Multilayer Perceptron (MLP)

The most common type of feedforward network is the Multilayer Perceptron (MLP).

Characteristics:

fully connected layers
nonlinear activations
trained using gradient descent

MLPs are used for:

classification
regression
representation learning

Universal Function Approximation

Feedforward networks have strong theoretical properties.

The Universal Approximation Theorem states that a neural network with one hidden layer and sufficient neurons can approximate any continuous function.

This makes feedforward networks powerful general-purpose models.

Activation Functions

Nonlinear activation functions allow networks to model complex relationships.

Common choices include:

ReLU
Sigmoid
Tanh
GELU

Without nonlinear activations, the network would collapse into a single linear transformation.

Training

Feedforward networks are trained using backpropagation.

A loss function measures prediction error:

[
\mathcal{L}(\theta)
]

Model parameters are updated using gradient descent:

[
\theta_{t+1} = \theta_t – \eta \nabla_\theta \mathcal{L}(\theta_t)
]

Where:

(\theta) = parameters
(\eta) = learning rate

Role in Modern Architectures

Although more complex architectures exist, feedforward networks remain core components in modern models.

Examples include:

Transformer feedforward blocks
classification heads in language models
representation layers in vision models

In Transformers, the Feedforward Network (FFN) operates after the attention layer to transform token representations.

Advantages

Feedforward networks are:

simple to implement
computationally efficient
theoretically well understood
highly flexible

They serve as the foundation for many deep learning systems.

Limitations

Feedforward networks lack mechanisms for modeling sequential relationships.

They cannot naturally represent:

temporal dependencies
long-range context
sequential memory

Architectures such as RNNs and Transformers address these limitations.

Summary

Feedforward networks are neural networks where information moves strictly from input to output through a series of transformations.

They form the foundational architecture of deep learning and remain essential components within modern models such as Transformers.

Related Concepts

Multilayer Perceptron (MLP)
Backpropagation
Activation Functions
Transformer Architecture
Convolutional Neural Networks
Universal Approximation Theorem