Neural Network Lexicon

Quantization

Short Definition

Quantization reduces numerical precision of model parameters.

Definition

Quantization converts high-precision values (such as 32-bit floats) into lower-precision representations. This reduces memory usage and improves inference speed, especially on constrained hardware.

Quantization is commonly applied after training, but can also be integrated during training.

Why It Matters

Quantization enables deployment on edge devices and accelerators.

How It Works (Conceptually)

Map continuous values to discrete levels
Trade precision for efficiency
Apply scaling to preserve accuracy

Minimal Python Example

quantized = int(weight * scale)

Common Pitfalls

Quantizing too early
Ignoring accuracy degradation
Using unsupported hardware

Related Concepts

Model Compression
Deployment
Efficiency