Transferability of Adversarial Examples

Short Definition

Transferability is the ability of adversarial examples to fool multiple models.

Definition

Transferability of adversarial examples refers to the phenomenon where adversarial inputs crafted to mislead one model also cause incorrect predictions in other models, even when those models differ in architecture, parameters, or training data.

This property enables effective black-box attacks using surrogate models.

Why It Matters

Transferability undermines the assumption that hiding model internals provides security. It shows that attackers do not need access to the target model to exploit vulnerabilities.

Transferability is central to:

  • black-box attack feasibility
  • real-world adversarial risk
  • understanding shared weaknesses across models

It reveals structural similarities in learned decision boundaries.

How Transferability Works (Conceptually)

  • A surrogate model is trained or chosen
  • Adversarial examples are generated against the surrogate
  • The same inputs are applied to a target model
  • The target model misclassifies the inputs

Shared representations enable transfer.

Factors Affecting Transferability

  • Similarity of model architectures
  • Overlap in training data or preprocessing
  • Use of similar optimization methods
  • Alignment of learned features

Greater similarity increases transfer success.

Transferability and Robustness

High transferability indicates that adversarial vulnerability is not model-specific but systemic. Defenses that reduce transferability often improve robustness across attack settings.

However, reduced transferability does not guarantee immunity.

Minimal Conceptual Example

# conceptual transfer attack
adversarial_input = attack(surrogate_model, input)
target_prediction = target_model(adversarial_input)

Common Pitfalls

  • Assuming transferability only applies to similar models
  • Ignoring transfer-based attacks during evaluation
  • Treating black-box robustness as independent of white-box robustness
  • Assuming transferability implies identical vulnerabilities

Transferability is probabilistic, not guaranteed.

Related Concepts

  • Adversarial Examples
  • Black-Box Attacks
  • White-Box Attacks
  • Model Robustness
  • Evasion Attacks