Short Definition
Same padding preserves the spatial dimensions of an input after convolution, while valid padding applies no padding and reduces the output size.
Same = preserve size.
Valid = no padding, shrink size.
Definition
In convolutional neural networks (CNNs), padding determines how the convolutional kernel interacts with the borders of an input.
Two common padding strategies are:
- Same Padding
- Valid Padding
Padding influences:
- Output dimensions
- Edge behavior
- Receptive field coverage
- Information retention at boundaries
I. Valid Padding
Valid padding applies no padding to the input.
Convolution only occurs where the kernel fully overlaps the input.
Output size:
[
\text{Output} = \frac{N – K}{S} + 1
]
Where:
- N = input size
- K = kernel size
- S = stride
Result:
Output size is smaller than input.
Example:
Input: 5×5
Kernel: 3×3
Stride: 1
Output: 3×3
Valid padding reduces spatial resolution.
II. Same Padding
Same padding adds zeros around the input so that the output size equals the input size (for stride = 1).
Padding size is chosen so:
[
\text{Output} = N
]
For odd kernel size K:
[
\text{Padding} = \frac{K – 1}{2}
]
Example:
Input: 5×5
Kernel: 3×3
Stride: 1
Padding: 1 pixel on each side
Output: 5×5
Same padding preserves dimensions.
Minimal Conceptual Illustration
Valid:
[Input] → Convolution → Smaller output
Same:
[Input] → Zero padding → Convolution → Same size output
Padding determines border behavior.
Why Padding Matters
1. Feature Map Size Control
Same padding:
- Preserves resolution.
- Easier stacking of layers.
- Common in deep CNNs.
Valid padding:
- Shrinks feature maps.
- Useful for progressive spatial compression.
2. Edge Information
Valid padding:
- Ignores borders partially.
- Reduces boundary artifacts.
Same padding:
- Introduces artificial zero context.
- May distort edge activations.
Padding changes how edge features are treated.
Relationship to Receptive Fields
Valid padding:
- Shrinks spatial dimension.
- Receptive field grows relative to feature map size.
Same padding:
- Keeps spatial size constant.
- Receptive field grows more gradually.
Padding affects hierarchical representation scaling.
Interaction with Stride
If stride > 1:
Same padding no longer strictly preserves size.
General formula:Output=⌈SN⌉
Same padding preserves relative scale but not exact size for S > 1.
Stride and padding interact jointly.
Relationship to Strided Convolution vs Pooling
Pooling often uses valid-style behavior (no padding).
Strided convolutions frequently use same padding.
Downsampling design combines stride and padding choices.
Practical Design Choices
Early CNNs:
- Often used valid padding.
Modern CNNs:
- Frequently use same padding for stability and simplicity.
Vision Transformers:
- Use patch embeddings instead of convolution padding.
Edge Effects & Bias
Zero padding (same):
- Assumes missing context equals zero.
- May introduce boundary bias.
Alternative padding strategies:
- Reflect padding
- Replicate padding
- Circular padding
These address edge artifacts differently.
Architectural Comparison
| Aspect | Same Padding | Valid Padding |
|---|---|---|
| Padding added | Yes | No |
| Output size (S=1) | Same as input | Smaller |
| Edge influence | Zero-context assumption | Edge ignored partially |
| Common in deep nets | Yes | Less common |
| Spatial shrinkage | No | Yes |
When to Use Each
Same Padding:
- Deep CNN stacks
- Feature alignment across layers
- Tasks needing spatial consistency
Valid Padding:
- When spatial shrinkage is desired
- When avoiding artificial edge assumptions
- Early feature compression stages
Long-Term Architectural Implications
Padding choices influence:
- Spatial bias
- Hierarchical feature formation
- Model inductive bias
- Boundary robustness
Small implementation choices accumulate across depth.
Related Concepts
- Convolution Operation
- Stride and Padding
- Receptive Fields
- Strided Convolution vs Pooling
- Dilated Convolutions
- Feature Maps