Dilated Convolutions vs Stride

Short Definition

Dilated convolutions expand the receptive field by spacing out kernel elements without reducing spatial resolution, while stride reduces spatial resolution by skipping positions during convolution.

Dilation enlarges context without shrinking feature maps.
Stride shrinks feature maps while enlarging context.

Definition

Both dilation and stride modify how a convolutional kernel moves across input data, but they serve different architectural purposes.

  • Stride controls how far the kernel moves each step.
  • Dilation controls spacing between kernel elements.

These choices influence:

  • Receptive field growth
  • Spatial resolution
  • Information retention
  • Computational cost

Though both increase effective receptive field, they do so in fundamentally different ways.

I. Strided Convolution

In strided convolution:

  • The kernel moves by S pixels per step.
  • Output spatial size decreases.

Output size formula:

[
\text{Output} = \frac{N – K}{S} + 1
]

Example:

Input: 8×8
Kernel: 3×3
Stride: 2

Output: 3×3

Stride performs downsampling.

II. Dilated Convolution

Dilated convolution (also called atrous convolution):

  • Inserts gaps between kernel elements.
  • Increases receptive field without changing resolution (if stride = 1).

Dilated kernel example (rate = 2):

Normal kernel:
[ a b c ]

Dilated:
[ a 0 b 0 c ]

Effective receptive field grows.

No downsampling occurs.

Minimal Conceptual Illustration


Stride:
Kernel moves farther → output smaller

Dilation:
Kernel stretches → output same size

Stride compresses.
Dilation expands.

Receptive Field Comparison

Let:

  • Kernel size = 3
  • Stride = 2
  • Dilation rate = 2

Effective receptive field:

Stride increases coverage indirectly via resolution reduction.

Dilation increases coverage directly by spreading kernel elements.

Dilation grows receptive field exponentially with depth.

Spatial Resolution Effects

MethodOutput SizeReceptive FieldResolution
StrideReducedIncreasedLower
DilationPreservedIncreasedSame

Stride trades resolution for efficiency.
Dilation preserves resolution while expanding context.


Information Flow

Stride:

  • Discards intermediate positions.
  • Compresses representation.
  • Reduces computational load.

Dilation:

  • Preserves feature map size.
  • Retains fine-grained spatial information.
  • Adds long-range context.

Stride reduces data.
Dilation preserves detail.

Use Cases

Strided Convolution:

  • Hierarchical CNN design
  • Feature compression
  • Image classification
  • Efficiency-focused models

Dilated Convolution:

  • Semantic segmentation
  • Dense prediction tasks
  • Audio modeling
  • Context-sensitive applications

Dilation is common in tasks requiring spatial precision.

Computational Trade-Off

Stride:

  • Reduces compute in later layers.
  • Improves inference speed.

Dilation:

  • Keeps feature maps large.
  • More computationally intensive.
  • Higher memory footprint.

Efficiency vs resolution trade-off.

Architectural Implications

Combining both:

  • Early layers often use stride.
  • Later layers may use dilation.

Modern segmentation architectures use:

  • Reduced downsampling
  • Increased dilation

to preserve detail while expanding context.

Relationship to Receptive Fields

Effective receptive field size:

Normal convolution:R=KR = KR=K

Dilated convolution:R=K+(K1)(d1)R = K + (K – 1)(d – 1)R=K+(K−1)(d−1)

Where:
d = dilation rate

Dilation allows exponential receptive field growth without pooling.

Risk of Gridding Artifacts

High dilation rates may cause:

  • Checkerboard or gridding artifacts
  • Sparse sampling patterns

Design must balance dilation rates carefully.

Stride vs Dilation Summary

AspectStrideDilation
Reduces spatial sizeYesNo
Preserves resolutionNoYes
Expands receptive fieldYesYes
Efficient for classificationYesLess so
Good for dense predictionLess soYes
Risk of artifactsDownsampling lossGridding artifacts

Long-Term Architectural Relevance

Stride supports hierarchical abstraction.

Dilation supports context-aware precision.

Together, they define spatial modeling strategies in CNNs.

Modern architectures strategically balance both.

Related Concepts

  • Convolution Operation
  • Stride and Padding
  • Receptive Fields
  • Strided Convolution vs Pooling
  • Same vs Valid Padding
  • Checkerboard Artifacts
  • Feature Maps