Improved Denoising Diffusion Probabilistic Models

Alex Nichol, Prafulla Dhariwal

2021 · ICML

Improved Denoising Diffusion Probabilistic Models

Problem

Framing

DDPMs still pay a steep sampling cost and leave likelihood on the table when reverse variances are fixed. This paper closes both gaps with learned variances, a hybrid ϵ\epsilon-plus-VLB objective, and a cosine noise schedule. CIFAR-10 reaches FID 2.94, and ImageNet 64×\times64 reaches 3.53 bits/dim.

Currently Used Methods

Direct antecedents

Proposed Method

Architecture

The model keeps the DDPM U-Net and changes the reverse-process parameterization. The network predicts the mean through the usual ϵθ\boldsymbol{\epsilon}_{\theta} path and learns the variance through an interpolation variable v\mathbf{v} between βt\beta_t and β~t\tilde{\beta}_t.

Verified figure: linear-vs-cosine noise schedules, with latent image strips showing that cosine preserves structure longer while linear becomes pure noise earlier.

Loss / Objective

Training uses a hybrid objective that keeps the DDPM denoising loss dominant while adding a small variational term.

Lhybrid=Lsimple+λLvlbL_{\mathrm{hybrid}} = L_{\mathrm{simple}} + \lambda L_{\mathrm{vlb}} Lsimple=Et,x0,ϵ[ϵϵθ(xt,t)2]L_{\mathrm{simple}} = \mathbb{E}_{t,\mathbf{x}_0,\boldsymbol{\epsilon}} \left[ \left\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_{\theta}(\mathbf{x}_t,t) \right\|^2 \right] Σθ(xt,t)=exp ⁣(vlogβt+(1v)logβ~t)\Sigma_{\theta}(\mathbf{x}_t,t) = \exp\!\left( \mathbf{v} \log \beta_t + (1-\mathbf{v}) \log \tilde{\beta}_t \right)

Sampling Rule

Sampling remains ancestral, with the learned reverse variance and a cosine cumulative-noise schedule.

pθ(xt1xt)=N ⁣(xt1;μθ(xt,t),Σθ(xt,t))p_{\theta}(\mathbf{x}_{t-1}\mid \mathbf{x}_t) = \mathcal{N}\!\left( \mathbf{x}_{t-1}; \boldsymbol{\mu}_{\theta}(\mathbf{x}_t,t), \Sigma_{\theta}(\mathbf{x}_t,t) \right) αˉt=f(t)f(0),f(t)=cos2 ⁣(t/T+s1+sπ2)\bar{\alpha}_t = \frac{f(t)}{f(0)}, \qquad f(t)=\cos^2\!\left( \frac{t/T+s}{1+s} \cdot \frac{\pi}{2} \right)

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Table 1: Ablating schedule and objective on ImageNet 64 ×\times 64.

ItersTScheduleObjectiveNLLFID
200K1KlinearLsimpleL_{\mathrm{simple}}3.9932.5
200K4KlinearLsimpleL_{\mathrm{simple}}3.7731.3
200K4KlinearLhybridL_{\mathrm{hybrid}}3.6632.2
200K4KcosineLsimpleL_{\mathrm{simple}}3.6827.0
200K4KcosineLhybridL_{\mathrm{hybrid}}3.6228.0
200K4KcosineLvlbL_{\mathrm{vlb}}3.5756.7
1.5M4KcosineLhybridL_{\mathrm{hybrid}}3.5719.2
1.5M4KcosineLvlbL_{\mathrm{vlb}}3.5340.1

Verified results plot: NLL versus evaluation steps on ImageNet 64x64 and CIFAR-10, showing the paper's L_{\mathrm{hybrid}} curves below fixed-variance and DDIM-style baselines, especially at low step counts.

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers