Adversarial Examples

Short Definition

Adversarial examples are inputs deliberately modified to cause a model to make incorrect predictions.

Definition

Adversarial examples are inputs that have been intentionally perturbed—often by imperceptibly small changes—to induce incorrect predictions from a trained model. These perturbations are typically crafted to exploit vulnerabilities in the model’s decision boundary while preserving semantic similarity to the original input.

Although adversarial examples may look unchanged to humans, they can reliably cause confident misclassification by neural networks.

Why It Matters

Adversarial examples expose fundamental weaknesses in model generalization and robustness. They demonstrate that high accuracy on standard test data does not guarantee reliable behavior under slight, structured perturbations.

This has serious implications for safety-critical systems, security-sensitive applications, and real-world deployment.

How It Works (Conceptually)

  • A trained model defines a decision boundary in input space
  • Small, targeted perturbations move inputs across that boundary
  • The model’s prediction changes despite minimal input change
  • Perturbations are often computed using gradient information

Adversarial examples reveal that learned representations can be fragile.

Common Types

  • Evasion attacks: perturb inputs at inference time
  • Targeted attacks: force prediction into a specific class
  • Untargeted attacks: cause any incorrect prediction
  • White-box attacks: assume full model access
  • Black-box attacks: rely on query-based or transfer methods

Each type probes different aspects of model robustness.

Minimal Python Example

# conceptual illustration
adversarial_input = input + epsilon * sign(gradient(loss, input))


Common Pitfalls

  • Treating adversarial examples as rare edge cases
  • Assuming robustness against noise implies adversarial robustness
  • Evaluating robustness only on clean test data
  • Confusing adversarial examples with random corruption

Related Concepts

  • Model Robustness
  • Generalization
  • Out-of-Distribution Data
  • Distribution Shift
  • Uncertainty Estimation
  • Adversarial Training