Transferability of Adversarial Examples

Short Definition

Transferability is the ability of adversarial examples to fool multiple models.

Definition

Transferability of adversarial examples refers to the phenomenon where adversarial inputs crafted to mislead one model also cause incorrect predictions in other models, even when those models differ in architecture, parameters, or training data.

This property enables effective black-box attacks using surrogate models.

Why It Matters

Transferability undermines the assumption that hiding model internals provides security. It shows that attackers do not need access to the target model to exploit vulnerabilities.

Transferability is central to:

black-box attack feasibility
real-world adversarial risk
understanding shared weaknesses across models

It reveals structural similarities in learned decision boundaries.

How Transferability Works (Conceptually)

A surrogate model is trained or chosen
Adversarial examples are generated against the surrogate
The same inputs are applied to a target model
The target model misclassifies the inputs

Shared representations enable transfer.

Factors Affecting Transferability

Similarity of model architectures
Overlap in training data or preprocessing
Use of similar optimization methods
Alignment of learned features

Greater similarity increases transfer success.

Transferability and Robustness

High transferability indicates that adversarial vulnerability is not model-specific but systemic. Defenses that reduce transferability often improve robustness across attack settings.

However, reduced transferability does not guarantee immunity.

Minimal Conceptual Example

			
# conceptual transfer attack
adversarial_input = attack(surrogate_model, input)
target_prediction = target_model(adversarial_input)

Common Pitfalls

Assuming transferability only applies to similar models
Ignoring transfer-based attacks during evaluation
Treating black-box robustness as independent of white-box robustness
Assuming transferability implies identical vulnerabilities

Transferability is probabilistic, not guaranteed.

Related Concepts

Adversarial Examples
Black-Box Attacks
White-Box Attacks
Model Robustness
Evasion Attacks