Adversarial Examples

Short Definition

Adversarial examples are inputs deliberately modified to cause a model to make incorrect predictions.

Definition

Adversarial examples are inputs that have been intentionally perturbed—often by imperceptibly small changes—to induce incorrect predictions from a trained model. These perturbations are typically crafted to exploit vulnerabilities in the model’s decision boundary while preserving semantic similarity to the original input.

Although adversarial examples may look unchanged to humans, they can reliably cause confident misclassification by neural networks.

Why It Matters

Adversarial examples expose fundamental weaknesses in model generalization and robustness. They demonstrate that high accuracy on standard test data does not guarantee reliable behavior under slight, structured perturbations.

This has serious implications for safety-critical systems, security-sensitive applications, and real-world deployment.

How It Works (Conceptually)

A trained model defines a decision boundary in input space
Small, targeted perturbations move inputs across that boundary
The model’s prediction changes despite minimal input change
Perturbations are often computed using gradient information

Adversarial examples reveal that learned representations can be fragile.

Common Types

Evasion attacks: perturb inputs at inference time
Targeted attacks: force prediction into a specific class
Untargeted attacks: cause any incorrect prediction
White-box attacks: assume full model access
Black-box attacks: rely on query-based or transfer methods

Each type probes different aspects of model robustness.

Minimal Python Example

			
# conceptual illustration
adversarial_input = input + epsilon * sign(gradient(loss, input))

Common Pitfalls

Treating adversarial examples as rare edge cases
Assuming robustness against noise implies adversarial robustness
Evaluating robustness only on clean test data
Confusing adversarial examples with random corruption

Related Concepts

Model Robustness
Generalization
Out-of-Distribution Data
Distribution Shift
Uncertainty Estimation
Adversarial Training