Auto-Encoding Variational Bayes

Diederik P. Kingma, Max Welling

2013 · ICLR

Auto-Encoding Variational Bayes

Problem

Framing

Mean-field VB for directed latent-variable models fails once pθ(x)p_{\theta}(\mathbf{x}), pθ(zx)p_{\theta}(\mathbf{z}\mid\mathbf{x}), and posterior expectations are all intractable. The paper closes this with a reparameterized lower-bound estimator plus an amortized recognition model, replacing per-datapoint iterative inference with one encoder pass.

Currently Used Methods

Foundational

Proposed Method

Architecture

The model factorizes as pθ(z)pθ(xz)p_{\theta}(\mathbf{z})p_{\theta}(\mathbf{x}\mid\mathbf{z}) with an amortized posterior qϕ(zx)q_{\phi}(\mathbf{z}\mid\mathbf{x}). In the VAE instantiation, encoder and decoder are single-hidden-layer MLPs; the encoder outputs diagonal-Gaussian μ(x)\boldsymbol{\mu}(\mathbf{x}) and σ(x)\boldsymbol{\sigma}(\mathbf{x}), and the decoder outputs Bernoulli or Gaussian observation parameters.

Directed graphical model: latent variable z, observed x, solid generative edges for p_{\theta}(z)p_{\theta}(x\mid z), and dashed variational edge for q_{\phi}(z\mid x).

Loss / Objective

The method maximizes the variational lower bound; for the Gaussian VAE with diagonal posterior it uses:

L(θ,ϕ;x(i))12j=1J(1+log((σj(i))2)(μj(i))2(σj(i))2)+1Ll=1Llogpθ(x(i)z(i,l))\mathcal{L}(\theta, \phi; \mathbf{x}^{(i)}) \approx \frac{1}{2}\sum_{j=1}^{J}\left(1 + \log \left((\sigma_j^{(i)})^2\right) - (\mu_j^{(i)})^2 - (\sigma_j^{(i)})^2\right) + \frac{1}{L}\sum_{l=1}^{L} \log p_{\theta}(\mathbf{x}^{(i)} \mid \mathbf{z}^{(i,l)})

Sampling Rule

Sampling uses the pathwise reparameterization:

z(i,l)=μ(i)+σ(i)ϵ(l),ϵ(l)N(0,I)\mathbf{z}^{(i,l)} = \boldsymbol{\mu}^{(i)} + \boldsymbol{\sigma}^{(i)} \odot \boldsymbol{\epsilon}^{(l)}, \qquad \boldsymbol{\epsilon}^{(l)} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Results plots: MNIST and Frey Face lower-bound curves across latent dimensions, where AEVB train/test curves rise faster and higher than wake-sleep.

Sample grid: Frey Face latent-manifold traversal with smooth identity and expression changes across the grid.

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

No prior vault papers identified yet.

Further Papers