Short Definition

Dilated convolutions expand the receptive field by spacing out kernel elements without reducing spatial resolution, while stride reduces spatial resolution by skipping positions during convolution.

Dilation enlarges context without shrinking feature maps.
Stride shrinks feature maps while enlarging context.

Definition

Both dilation and stride modify how a convolutional kernel moves across input data, but they serve different architectural purposes.

Stride controls how far the kernel moves each step.
Dilation controls spacing between kernel elements.

These choices influence:

Receptive field growth
Spatial resolution
Information retention
Computational cost

Though both increase effective receptive field, they do so in fundamentally different ways.

I. Strided Convolution

In strided convolution:

The kernel moves by S pixels per step.
Output spatial size decreases.

Output size formula:

[
\text{Output} = \frac{N – K}{S} + 1
]

Example:

Input: 8×8
Kernel: 3×3
Stride: 2

Output: 3×3

Stride performs downsampling.

II. Dilated Convolution

Dilated convolution (also called atrous convolution):

Inserts gaps between kernel elements.
Increases receptive field without changing resolution (if stride = 1).

Dilated kernel example (rate = 2):

Normal kernel:
[ a b c ]

Dilated:
[ a 0 b 0 c ]

Effective receptive field grows.

No downsampling occurs.

Minimal Conceptual Illustration

Stride:
Kernel moves farther → output smaller

Dilation:
Kernel stretches → output same size

Stride compresses.
Dilation expands.

Receptive Field Comparison

Let:

Kernel size = 3
Stride = 2
Dilation rate = 2

Effective receptive field:

Stride increases coverage indirectly via resolution reduction.

Dilation increases coverage directly by spreading kernel elements.

Dilation grows receptive field exponentially with depth.

Spatial Resolution Effects

Method	Output Size	Receptive Field	Resolution
Stride	Reduced	Increased	Lower
Dilation	Preserved	Increased	Same

Stride trades resolution for efficiency.
Dilation preserves resolution while expanding context.

Information Flow

Stride:

Discards intermediate positions.
Compresses representation.
Reduces computational load.

Dilation:

Preserves feature map size.
Retains fine-grained spatial information.
Adds long-range context.

Stride reduces data.
Dilation preserves detail.

Use Cases

Strided Convolution:

Hierarchical CNN design
Feature compression
Image classification
Efficiency-focused models

Dilated Convolution:

Semantic segmentation
Dense prediction tasks
Audio modeling
Context-sensitive applications

Dilation is common in tasks requiring spatial precision.

Computational Trade-Off

Stride:

Reduces compute in later layers.
Improves inference speed.

Dilation:

Keeps feature maps large.
More computationally intensive.
Higher memory footprint.

Efficiency vs resolution trade-off.

Architectural Implications

Combining both:

Early layers often use stride.
Later layers may use dilation.

Modern segmentation architectures use:

Reduced downsampling
Increased dilation

to preserve detail while expanding context.

Relationship to Receptive Fields

Effective receptive field size:

Normal convolution: $R = K$ R=K

Dilated convolution: $R = K + (K – 1)(d – 1)$ R=K+(K−1)(d−1)

Where:
d = dilation rate

Dilation allows exponential receptive field growth without pooling.

Risk of Gridding Artifacts

High dilation rates may cause:

Checkerboard or gridding artifacts
Sparse sampling patterns

Design must balance dilation rates carefully.

Stride vs Dilation Summary

Aspect	Stride	Dilation
Reduces spatial size	Yes	No
Preserves resolution	No	Yes
Expands receptive field	Yes	Yes
Efficient for classification	Yes	Less so
Good for dense prediction	Less so	Yes
Risk of artifacts	Downsampling loss	Gridding artifacts

Long-Term Architectural Relevance

Stride supports hierarchical abstraction.

Dilation supports context-aware precision.

Together, they define spatial modeling strategies in CNNs.

Modern architectures strategically balance both.

Related Concepts

Convolution Operation
Stride and Padding
Receptive Fields
Strided Convolution vs Pooling
Same vs Valid Padding
Checkerboard Artifacts
Feature Maps

Neural Network Lexicon

Dilated Convolutions vs Stride