Density Estimation using Real NVP

Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio

2017 · ICLR

Density Estimation using Real NVP

Problem

Framing

Likelihood models still traded off exact density, exact inference, and fast sampling on high-dimensional images. Real NVP closes this with affine coupling maps whose Jacobian is triangular, so density evaluation and inversion stay exact. It reports 3.49 bits/dim on CIFAR-10 and 2.72 on LSUN bedroom.

Currently Used Methods

Foundational

Proposed Method

Architecture

Real NVP stacks affine coupling layers with alternating checkerboard and channel-wise masks. A squeeze operation trades spatial size for channels, and a multi-scale scheme factors out variables across resolutions until a final 4×4×c4 \times 4 \times c tensor.

Verified architecture figure: checkerboard masking before squeezing, then channel-wise masking after reshaping spatial positions into channels.

Loss / Objective

Training maximizes exact log-likelihood under change of variables.

logpX(x)=logpZ(f(x))+logdetf(x)xT\log p_X(\mathbf{x}) = \log p_Z\big(f(\mathbf{x})\big) + \log \left| \det \frac{\partial f(\mathbf{x})}{\partial \mathbf{x}^T} \right|

Sampling Rule / Algorithm

Each coupling layer is analytically invertible.

y1:d=x1:d,yd+1:D=xd+1:Dexp(s(x1:d))+t(x1:d),x1:d=y1:d,xd+1:D=(yd+1:Dt(y1:d))exp(s(y1:d)).\begin{aligned} \mathbf{y}_{1:d} &= \mathbf{x}_{1:d}, \\ \mathbf{y}_{d+1:D} &= \mathbf{x}_{d+1:D} \odot \exp\big(s(\mathbf{x}_{1:d})\big) + t(\mathbf{x}_{1:d}), \\ \mathbf{x}_{1:d} &= \mathbf{y}_{1:d}, \\ \mathbf{x}_{d+1:D} &= \big(\mathbf{y}_{d+1:D} - t(\mathbf{y}_{1:d})\big) \odot \exp\big(-s(\mathbf{y}_{1:d})\big). \end{aligned}

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Table 1: Bits/dim results across CIFAR-10, ImageNet, LSUN, and CelebA.

DatasetPixelRNN [46]Real NVPConv DRAW [22]IAF-VAE [34]
CIFAR-103.003.49< 3.59< 3.28
Imagenet (32 \times 32)3.86 (3.83)4.28 (4.26)< 4.40 (4.35)
Imagenet (64 \times 64)3.63 (3.57)3.98 (3.75)< 4.10 (4.04)
LSUN (bedroom)2.72 (2.70)
LSUN (tower)2.81 (2.78)
LSUN (church outdoor)3.08 (2.94)
CelebA3.02 (2.97)

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers