Evasion Attacks

Short Definition

Evasion attacks manipulate inputs at inference time to cause incorrect model predictions.

Definition

Evasion attacks are adversarial attacks in which an attacker modifies inputs at prediction time—after the model has been trained—to induce misclassification. The model itself is not altered; instead, the attacker exploits weaknesses in the learned decision boundary.

Evasion attacks are the most commonly studied form of adversarial attack.

Why It Matters

Most deployed machine learning systems operate in environments where inputs can be influenced or controlled by users. Evasion attacks demonstrate that models can be manipulated without access to training data or training processes.

They pose serious risks for real-world systems such as:

  • image recognition
  • biometric authentication
  • spam and malware detection
  • content moderation

Evasion attacks challenge assumptions of input integrity.

How Evasion Attacks Work (Conceptually)

  • The attacker observes or approximates the model’s behavior
  • Small, structured perturbations are applied to inputs
  • The perturbed input crosses a decision boundary
  • The model produces an incorrect prediction

These perturbations are often imperceptible to humans.

Key Characteristics

  • Occur at inference time
  • Do not modify the training dataset
  • Can be white-box or black-box
  • Often rely on gradient information or transferability

Evasion attacks test runtime robustness.

Evasion vs Other Attack Types

  • Evasion attacks: inference-time manipulation
  • Poisoning attacks: training-time data manipulation (if added later)

This distinction separates runtime exploitation from training corruption.

Minimal Conceptual Example

# inference-time manipulation
adversarial_input = input + small_perturbation
prediction = model(adversarial_input)

Common Pitfalls

  • Assuming evasion attacks are unrealistic
  • Confusing evasion attacks with random noise
  • Evaluating robustness only under clean inputs
  • Ignoring confidence behavior during evasion

Evasion attacks exploit worst-case inputs, not average behavior.

Related Concepts