Short Definition

Same padding preserves the spatial dimensions of an input after convolution, while valid padding applies no padding and reduces the output size.

Same = preserve size.
Valid = no padding, shrink size.

Definition

In convolutional neural networks (CNNs), padding determines how the convolutional kernel interacts with the borders of an input.

Two common padding strategies are:

Same Padding
Valid Padding

Padding influences:

Output dimensions
Edge behavior
Receptive field coverage
Information retention at boundaries

I. Valid Padding

Valid padding applies no padding to the input.

Convolution only occurs where the kernel fully overlaps the input.

Output size:

[
\text{Output} = \frac{N – K}{S} + 1
]

Where:

N = input size
K = kernel size
S = stride

Result:

Output size is smaller than input.

Example:

Input: 5×5
Kernel: 3×3
Stride: 1

Output: 3×3

Valid padding reduces spatial resolution.

II. Same Padding

Same padding adds zeros around the input so that the output size equals the input size (for stride = 1).

Padding size is chosen so:

[
\text{Output} = N
]

For odd kernel size K:

[
\text{Padding} = \frac{K – 1}{2}
]

Example:

Input: 5×5
Kernel: 3×3
Stride: 1

Padding: 1 pixel on each side

Output: 5×5

Same padding preserves dimensions.

Minimal Conceptual Illustration

Valid:
[Input] → Convolution → Smaller output

Same:
[Input] → Zero padding → Convolution → Same size output

Padding determines border behavior.

Why Padding Matters

1. Feature Map Size Control

Same padding:

Preserves resolution.
Easier stacking of layers.
Common in deep CNNs.

Valid padding:

Shrinks feature maps.
Useful for progressive spatial compression.

2. Edge Information

Valid padding:

Ignores borders partially.
Reduces boundary artifacts.

Same padding:

Introduces artificial zero context.
May distort edge activations.

Padding changes how edge features are treated.

Relationship to Receptive Fields

Valid padding:

Shrinks spatial dimension.
Receptive field grows relative to feature map size.

Same padding:

Keeps spatial size constant.
Receptive field grows more gradually.

Padding affects hierarchical representation scaling.

Interaction with Stride

If stride > 1:

Same padding no longer strictly preserves size.

General formula: $\text{Output} = \left\lceil \frac{N}{S} \right\rceil$ Output=⌈SN⌉

Same padding preserves relative scale but not exact size for S > 1.

Stride and padding interact jointly.

Relationship to Strided Convolution vs Pooling

Pooling often uses valid-style behavior (no padding).

Strided convolutions frequently use same padding.

Downsampling design combines stride and padding choices.

Practical Design Choices

Early CNNs:

Often used valid padding.

Modern CNNs:

Frequently use same padding for stability and simplicity.

Vision Transformers:

Use patch embeddings instead of convolution padding.

Edge Effects & Bias

Zero padding (same):

Assumes missing context equals zero.
May introduce boundary bias.

Alternative padding strategies:

Reflect padding
Replicate padding
Circular padding

These address edge artifacts differently.

Architectural Comparison

Aspect	Same Padding	Valid Padding
Padding added	Yes	No
Output size (S=1)	Same as input	Smaller
Edge influence	Zero-context assumption	Edge ignored partially
Common in deep nets	Yes	Less common
Spatial shrinkage	No	Yes

When to Use Each

Same Padding:

Deep CNN stacks
Feature alignment across layers
Tasks needing spatial consistency

Valid Padding:

When spatial shrinkage is desired
When avoiding artificial edge assumptions
Early feature compression stages

Long-Term Architectural Implications

Padding choices influence:

Spatial bias
Hierarchical feature formation
Model inductive bias
Boundary robustness

Small implementation choices accumulate across depth.

Related Concepts

Convolution Operation
Stride and Padding
Receptive Fields
Strided Convolution vs Pooling
Dilated Convolutions
Feature Maps

Neural Network Lexicon

Same vs Valid Padding

Short Definition

Definition

I. Valid Padding

II. Same Padding

Minimal Conceptual Illustration

Why Padding Matters

1. Feature Map Size Control

2. Edge Information

Relationship to Receptive Fields

Interaction with Stride

Relationship to Strided Convolution vs Pooling

Practical Design Choices

Edge Effects & Bias

Architectural Comparison

When to Use Each

Long-Term Architectural Implications

Related Concepts