Same vs Valid Padding

Short Definition

Same padding preserves the spatial dimensions of an input after convolution, while valid padding applies no padding and reduces the output size.

Same = preserve size.
Valid = no padding, shrink size.

Definition

In convolutional neural networks (CNNs), padding determines how the convolutional kernel interacts with the borders of an input.

Two common padding strategies are:

  1. Same Padding
  2. Valid Padding

Padding influences:

  • Output dimensions
  • Edge behavior
  • Receptive field coverage
  • Information retention at boundaries

I. Valid Padding

Valid padding applies no padding to the input.

Convolution only occurs where the kernel fully overlaps the input.

Output size:

[
\text{Output} = \frac{N – K}{S} + 1
]

Where:

  • N = input size
  • K = kernel size
  • S = stride

Result:

Output size is smaller than input.

Example:

Input: 5×5
Kernel: 3×3
Stride: 1

Output: 3×3

Valid padding reduces spatial resolution.

II. Same Padding

Same padding adds zeros around the input so that the output size equals the input size (for stride = 1).

Padding size is chosen so:

[
\text{Output} = N
]

For odd kernel size K:

[
\text{Padding} = \frac{K – 1}{2}
]

Example:

Input: 5×5
Kernel: 3×3
Stride: 1

Padding: 1 pixel on each side

Output: 5×5

Same padding preserves dimensions.

Minimal Conceptual Illustration


Valid:
[Input] → Convolution → Smaller output

Same:
[Input] → Zero padding → Convolution → Same size output

Padding determines border behavior.

Why Padding Matters

1. Feature Map Size Control

Same padding:

  • Preserves resolution.
  • Easier stacking of layers.
  • Common in deep CNNs.

Valid padding:

  • Shrinks feature maps.
  • Useful for progressive spatial compression.

2. Edge Information

Valid padding:

  • Ignores borders partially.
  • Reduces boundary artifacts.

Same padding:

  • Introduces artificial zero context.
  • May distort edge activations.

Padding changes how edge features are treated.

Relationship to Receptive Fields

Valid padding:

  • Shrinks spatial dimension.
  • Receptive field grows relative to feature map size.

Same padding:

  • Keeps spatial size constant.
  • Receptive field grows more gradually.

Padding affects hierarchical representation scaling.

Interaction with Stride

If stride > 1:

Same padding no longer strictly preserves size.

General formula:Output=NS\text{Output} = \left\lceil \frac{N}{S} \right\rceilOutput=⌈SN​⌉

Same padding preserves relative scale but not exact size for S > 1.

Stride and padding interact jointly.

Relationship to Strided Convolution vs Pooling

Pooling often uses valid-style behavior (no padding).

Strided convolutions frequently use same padding.

Downsampling design combines stride and padding choices.

Practical Design Choices

Early CNNs:

  • Often used valid padding.

Modern CNNs:

  • Frequently use same padding for stability and simplicity.

Vision Transformers:

  • Use patch embeddings instead of convolution padding.

Edge Effects & Bias

Zero padding (same):

  • Assumes missing context equals zero.
  • May introduce boundary bias.

Alternative padding strategies:

  • Reflect padding
  • Replicate padding
  • Circular padding

These address edge artifacts differently.

Architectural Comparison

AspectSame PaddingValid Padding
Padding addedYesNo
Output size (S=1)Same as inputSmaller
Edge influenceZero-context assumptionEdge ignored partially
Common in deep netsYesLess common
Spatial shrinkageNoYes

When to Use Each

Same Padding:

  • Deep CNN stacks
  • Feature alignment across layers
  • Tasks needing spatial consistency

Valid Padding:

  • When spatial shrinkage is desired
  • When avoiding artificial edge assumptions
  • Early feature compression stages

Long-Term Architectural Implications

Padding choices influence:

  • Spatial bias
  • Hierarchical feature formation
  • Model inductive bias
  • Boundary robustness

Small implementation choices accumulate across depth.

Related Concepts

  • Convolution Operation
  • Stride and Padding
  • Receptive Fields
  • Strided Convolution vs Pooling
  • Dilated Convolutions
  • Feature Maps