Short Definition
Dilated convolutions expand the receptive field by spacing out kernel elements without reducing spatial resolution, while stride reduces spatial resolution by skipping positions during convolution.
Dilation enlarges context without shrinking feature maps.
Stride shrinks feature maps while enlarging context.
Definition
Both dilation and stride modify how a convolutional kernel moves across input data, but they serve different architectural purposes.
- Stride controls how far the kernel moves each step.
- Dilation controls spacing between kernel elements.
These choices influence:
- Receptive field growth
- Spatial resolution
- Information retention
- Computational cost
Though both increase effective receptive field, they do so in fundamentally different ways.
I. Strided Convolution
In strided convolution:
- The kernel moves by S pixels per step.
- Output spatial size decreases.
Output size formula:
[
\text{Output} = \frac{N – K}{S} + 1
]
Example:
Input: 8×8
Kernel: 3×3
Stride: 2
Output: 3×3
Stride performs downsampling.
II. Dilated Convolution
Dilated convolution (also called atrous convolution):
- Inserts gaps between kernel elements.
- Increases receptive field without changing resolution (if stride = 1).
Dilated kernel example (rate = 2):
Normal kernel:
[ a b c ]
Dilated:
[ a 0 b 0 c ]
Effective receptive field grows.
No downsampling occurs.
Minimal Conceptual Illustration
Stride:
Kernel moves farther → output smaller
Dilation:
Kernel stretches → output same size
Stride compresses.
Dilation expands.
Receptive Field Comparison
Let:
- Kernel size = 3
- Stride = 2
- Dilation rate = 2
Effective receptive field:
Stride increases coverage indirectly via resolution reduction.
Dilation increases coverage directly by spreading kernel elements.
Dilation grows receptive field exponentially with depth.
Spatial Resolution Effects
| Method | Output Size | Receptive Field | Resolution |
|---|---|---|---|
| Stride | Reduced | Increased | Lower |
| Dilation | Preserved | Increased | Same |
Stride trades resolution for efficiency.
Dilation preserves resolution while expanding context.
Information Flow
Stride:
- Discards intermediate positions.
- Compresses representation.
- Reduces computational load.
Dilation:
- Preserves feature map size.
- Retains fine-grained spatial information.
- Adds long-range context.
Stride reduces data.
Dilation preserves detail.
Use Cases
Strided Convolution:
- Hierarchical CNN design
- Feature compression
- Image classification
- Efficiency-focused models
Dilated Convolution:
- Semantic segmentation
- Dense prediction tasks
- Audio modeling
- Context-sensitive applications
Dilation is common in tasks requiring spatial precision.
Computational Trade-Off
Stride:
- Reduces compute in later layers.
- Improves inference speed.
Dilation:
- Keeps feature maps large.
- More computationally intensive.
- Higher memory footprint.
Efficiency vs resolution trade-off.
Architectural Implications
Combining both:
- Early layers often use stride.
- Later layers may use dilation.
Modern segmentation architectures use:
- Reduced downsampling
- Increased dilation
to preserve detail while expanding context.
Relationship to Receptive Fields
Effective receptive field size:
Normal convolution:R=K
Dilated convolution:R=K+(K−1)(d−1)
Where:
d = dilation rate
Dilation allows exponential receptive field growth without pooling.
Risk of Gridding Artifacts
High dilation rates may cause:
- Checkerboard or gridding artifacts
- Sparse sampling patterns
Design must balance dilation rates carefully.
Stride vs Dilation Summary
| Aspect | Stride | Dilation |
|---|---|---|
| Reduces spatial size | Yes | No |
| Preserves resolution | No | Yes |
| Expands receptive field | Yes | Yes |
| Efficient for classification | Yes | Less so |
| Good for dense prediction | Less so | Yes |
| Risk of artifacts | Downsampling loss | Gridding artifacts |
Long-Term Architectural Relevance
Stride supports hierarchical abstraction.
Dilation supports context-aware precision.
Together, they define spatial modeling strategies in CNNs.
Modern architectures strategically balance both.
Related Concepts
- Convolution Operation
- Stride and Padding
- Receptive Fields
- Strided Convolution vs Pooling
- Same vs Valid Padding
- Checkerboard Artifacts
- Feature Maps