Variational Autoencoders (VAEs)

Short Definition

Variational Autoencoders (VAEs) are generative neural networks that learn a probabilistic latent representation of data, allowing the model to generate new samples by sampling from the learned latent distribution.

They combine neural networks with variational inference to learn structured latent spaces suitable for generative modeling.


Definition

A Variational Autoencoder extends the standard autoencoder by learning a probability distribution over latent variables instead of a single deterministic representation.

The encoder maps an input (x) to the parameters of a latent distribution:

[
(\mu, \sigma) = f_\theta(x)
]

Instead of producing a fixed latent vector, the model samples:

[
z \sim \mathcal{N}(\mu, \sigma^2)
]

The decoder then reconstructs the input:

[
\hat{x} = g_\phi(z)
]

Training optimizes the Evidence Lower Bound (ELBO):

[
\mathcal{L} =
\mathbb{E}{q(z|x)}[\log p(x|z)] – D{KL}(q(z|x) || p(z))
]

Where:

  • (q(z|x)) = encoder distribution
  • (p(x|z)) = decoder likelihood
  • (D_{KL}) = Kullback–Leibler divergence
  • (p(z)) = prior latent distribution (usually Gaussian)

Core Idea

Instead of compressing inputs into fixed latent vectors, VAEs learn a continuous latent distribution.

Conceptually:

Input Data

Encoder

Latent Distribution (μ, σ)

Sample z

Decoder

Generated / Reconstructed Output

This allows the model to generate new data by sampling from the latent space.


Minimal Conceptual Illustration

Example with images:

Image

Encoder

Latent distribution (μ, σ)

Sample latent vector z

Decoder

Reconstructed image

New images can be created by sampling new latent vectors:

z ~ N(0,1)

Decoder

Generated image


Why the KL Divergence Term Exists

The KL divergence forces the latent distribution to remain close to a standard normal prior:

[
p(z) = \mathcal{N}(0,1)
]

This ensures the latent space is:

  • continuous
  • smooth
  • sampleable

Without this constraint, the model would behave like a regular autoencoder.


Reparameterization Trick

Sampling from a distribution normally breaks gradient-based learning.

VAEs solve this using the reparameterization trick:

[
z = \mu + \sigma \cdot \epsilon
]

where:

[
\epsilon \sim \mathcal{N}(0,1)
]

This makes the sampling operation differentiable.


Latent Space Properties

One of the major strengths of VAEs is the structure of the latent space.

Good VAE latent spaces allow:

  • smooth interpolation
  • controllable generation
  • semantic structure

Example interpolation:

Latent vector A (dog image)

interpolate

Latent vector B (cat image)

The decoder produces a smooth transformation between outputs.


Applications

Variational Autoencoders are used in many areas of machine learning.

Generative Modeling

Generate new images, text, or data samples.

Representation Learning

Learn compact latent spaces useful for downstream tasks.

Anomaly Detection

Unusual samples often reconstruct poorly.

Data Imputation

VAEs can estimate missing values in datasets.

Molecular Design

Used to generate new chemical structures.


VAE vs Standard Autoencoder

PropertyAutoencoderVAE
Latent representationDeterministicProbabilistic
Generative capabilityLimitedStrong
Latent space structureUnconstrainedRegularized
Sampling abilityDifficultEasy

VAEs are explicitly designed for generative tasks.


Limitations

VAEs also have challenges.

Blurry Outputs

Reconstruction loss often produces blurred images.

Posterior Collapse

The decoder may ignore latent variables entirely.

Limited Sample Sharpness

Generative adversarial networks (GANs) often produce sharper images.


Importance in Deep Learning

VAEs were among the first deep generative models capable of learning structured latent spaces. They influenced many later generative architectures and remain widely used in probabilistic modeling and representation learning.


Summary

Variational Autoencoders are probabilistic generative models that learn structured latent spaces by combining neural networks with variational inference. By regularizing the latent distribution and enabling efficient sampling, VAEs allow models to generate new data and learn meaningful representations.


Related Concepts