High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz

2022 · CVPR

High-Resolution Image Synthesis with Latent Diffusion Models

Problem

Framing

Pixel-space diffusion delivers strong likelihoods, but its cost scales badly with image resolution. The paper shifts diffusion to a perceptually compressed latent space, preserving fidelity while cutting training and sampling cost enough for high-quality 256×256256\times256 synthesis and flexible conditioning.

Currently Used Methods

Foundational

Proposed Method

Architecture

LDM splits generation into an autoencoder E,D\mathcal{E},\mathcal{D} and a latent diffusion U-Net over z=E(x)\mathbf{z}=\mathcal{E}(\mathbf{x}). At 256×256256\times256, typical latent grids are 64×64×364\times64\times3 for f=4f=4 and 32×32×432\times32\times4 for f=8f=8. Conditioning enters by concatenation or cross-attention inside the denoiser.

Architecture diagram: an encoder maps images to latent space, diffusion runs with a latent U-Net, and conditioning enters through concatenation or cross-attention blocks.

Loss / Objective

The denoiser predicts Gaussian noise in latent space:

LLDM=EE(x),y,ϵN(0,I),t[ϵϵθ(zt,t,τθ(y))22]L_{\mathrm{LDM}} = \mathbb{E}_{\mathcal{E}(\mathbf{x}),\,\mathbf{y},\,\boldsymbol{\epsilon}\sim\mathcal{N}(\mathbf{0},\mathbf{I}),\,t}\left[\left\|\boldsymbol{\epsilon}-\boldsymbol{\epsilon}_{\theta}(\mathbf{z}_t,t,\tau_{\theta}(\mathbf{y}))\right\|_2^2\right]

Sampling Rule / Algorithm

Sampling runs reverse diffusion on latents, then decodes once at the end:

zt1pθ(zt1zt,y),x=D(z0)\mathbf{z}_{t-1} \sim p_{\theta}(\mathbf{z}_{t-1}\mid \mathbf{z}_t, \mathbf{y}), \qquad \mathbf{x}=\mathcal{D}(\mathbf{z}_0)

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Sample grid: random generations from LDMs on CelebAHQ, FFHQ, LSUN-Churches, LSUN-Bedrooms, and class-conditional ImageNet at 256\times256.

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers