Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma

2020 · ICLR

Score-Based Generative Modeling through Stochastic Differential Equations

Problem

Framing

Discrete score models and DDPMs used fixed noise ladders, separate sampler derivations, and no exact likelihood path. The paper unifies them as continuous-time SDEs, then generates with reverse-time SDEs, predictor-corrector sampling, and a probability-flow ODE. On CIFAR-10, it reports FID 2.20 and 2.99 bits/dim.

Currently Used Methods

Foundational

Proposed Method

Architecture

The framework has three parts: a forward SDE, a time-conditioned score network sθ(x,t)\mathbf{s}_{\theta}(\mathbf{x}, t), and a reverse-time solver. Experiments use DDPM backbones and NCSN++, which adds FIR up/downsampling, 1/21/\sqrt{2} skip rescaling, BigGAN-style residual blocks, and progressive input/output paths.

Overview diagram: data are diffused to noise by a forward SDE, and a learned score field drives the reverse SDE from noise back to data.

Loss / Objective

Training fits the time-dependent score along SDE marginals.

minθ  EtUnif(0,T)[λ(t)Ex(0)p0Ex(t)p0t(x(t)x(0))[sθ(x(t),t)x(t)logp0t(x(t)x(0))22]]\min_{\theta} \; \mathbb{E}_{t \sim \mathrm{Unif}(0,T)} \left[ \lambda(t) \, \mathbb{E}_{\mathbf{x}(0) \sim p_0} \, \mathbb{E}_{\mathbf{x}(t) \sim p_{0t}(\mathbf{x}(t) \mid \mathbf{x}(0))} \left[ \left\| \mathbf{s}_{\theta}(\mathbf{x}(t), t) - \nabla_{\mathbf{x}(t)} \log p_{0t}(\mathbf{x}(t) \mid \mathbf{x}(0)) \right\|_2^2 \right] \right]

Sampling Rule / Algorithm

Generation solves the reverse-time SDE; the paper also uses a deterministic probability-flow ODE.

dx=[f(x,t)g(t)2xlogpt(x)]dt+g(t)dwˉ\mathrm{d}\mathbf{x} = \left[ \mathbf{f}(\mathbf{x}, t) - g(t)^2 \nabla_{\mathbf{x}} \log p_t(\mathbf{x}) \right] \, \mathrm{d}t + g(t) \, \mathrm{d}\bar{\mathbf{w}} dx=[f(x,t)12g(t)2xlogpt(x)]dt\mathrm{d}\mathbf{x} = \left[ \mathbf{f}(\mathbf{x}, t) - \frac{1}{2} g(t)^2 \nabla_{\mathbf{x}} \log p_t(\mathbf{x}) \right] \, \mathrm{d}t

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Table 1: CIFAR-10 FID for reverse-time samplers under VE-SDE and VP-SDE parameterizations.

PredictorVE P1000VE P2000VE C2000VE PC1000VP P1000VP P2000VP C2000VP PC1000
ancestral sampling4.98 ±\pm .064.88 ±\pm .063.62 ±\pm .033.24 ±\pm .023.24 ±\pm .023.21 ±\pm .02
reverse diffusion4.79 ±\pm .074.74 ±\pm .0820.43 ±\pm .073.60 ±\pm .023.21 ±\pm .023.19 ±\pm .0219.06 ±\pm .063.18 ±\pm .01
probability flow15.41 ±\pm .1510.54 ±\pm .083.51 ±\pm .043.59 ±\pm .043.23 ±\pm .033.06 ±\pm .03

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers