A Style-Based Generator Architecture for Generative Adversarial Networks

Tero Karras, Samuli Laine, Timo Aila

2019 · CVPR

A Style-Based Generator Architecture for Generative Adversarial Networks

Problem

Framing

Progressive GANs generate sharp high-resolution images, but a single input latent leaves semantics entangled across layers and scales. StyleGAN closes this gap with an intermediate latent $\mathbf{w}$ , per-layer style control, and explicit noise inputs, reducing FFHQ FID from 8.04 to 4.40.

Currently Used Methods

Foundational

@goodfellowGAN2014 — adversarial training for implicit generative modeling.
- Limitation in context: no mechanism for scale-specific latent control.
@radfordDCGAN2015 — convolutional GAN design for image synthesis.
- Limitation in context: latent factors stay mixed through all generator layers.
Progressive Growing of GANs for Improved Quality, Stability, and Variation — stable high-resolution GAN training by progressive layer growth.
- Limitation in context: generator semantics remain entangled across resolutions.
Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization — AdaIN modulates features with channelwise affine parameters.
- Limitation in context: not a generative latent architecture with learned styles.

Proposed Method

Architecture

The generator maps $\mathbf{z} \in \mathcal{Z}$ through an 8-layer MLP into $\mathbf{w} \in \mathcal{W}$ . A separate 18-layer synthesis network starts from a learned $4 \times 4 \times 512$ constant, applies AdaIN after each convolution, and injects single-channel Gaussian noise at every layer.

Architecture and ablation page: the left panel contrasts a traditional latent-input generator with StyleGAN's mapping network, learned constant input, per-layer AdaIN style control, and noise injection; the right panel shows the main FID ablation table.

Loss / Objective

The paper keeps the GAN objective and changes the generator parameterization through adaptive instance normalization.

\mathrm{AdaIN}(\mathbf{x}_i, \mathbf{y}) = y_{s,i} \frac{\mathbf{x}_i - \mu(\mathbf{x}_i)}{\sigma(\mathbf{x}_i)} + y_{b,i}

Sampling Rule / Algorithm

Sampling maps $\mathbf{z}$ into layerwise styles, then synthesizes from a learned constant plus stochastic noise.

\mathbf{w} = f(\mathbf{z}), \qquad \mathbf{y}^{(\ell)} = A^{(\ell)}(\mathbf{w}), \qquad \mathbf{x} = g\big(\mathbf{c}; \{\mathbf{y}^{(\ell)}\}_{\ell=1}^{L}, \{\mathbf{n}^{(\ell)}\}_{\ell=1}^{L}\big)

Training Procedure

Mapping network depth: 8 fully connected layers.
Intermediate latent dimensionality: 512.
Synthesis network: 18 layers.
Learned input constant: $4 \times 4 \times 512$ .
Mixing regularization tested at 0%, 50%, 90%, 100%.
CelebA-HQ loss: WGAN-GP.
FFHQ loss: non-saturating logistic with R1 regularization.

Evaluation

Datasets

CelebA-HQ
FFHQ
LSUN Bedroom
LSUN Car

Metrics

FID
Perceptual path length
Linear separability

Headline results

CelebA-HQ: baseline 7.79 FID; full model 5.06 FID.
FFHQ: baseline 8.04 FID; full model 4.40 FID.
FFHQ tuned baseline: 5.25 FID.
LSUN Bedroom at $256^2$ : 2.65 FID.
LSUN Car at $512 \times 384$ : 3.27 FID.

Table 1: FID for generator variants on CelebA-HQ and FFHQ

Method	CelebA-HQ	FFHQ
A Baseline Progressive GAN [30]	7.79	8.04
B + Tuning (incl. bilinear up/down)	6.11	5.25
C + Add mapping and styles	5.34	4.85
D + Remove traditional input	5.07	4.88
E + Add noise inputs	5.06	4.42
F + Mixing regularization	5.17	4.40

Ablations

Mapping network plus styles sharply improves FID over the tuned baseline.
Replacing the latent input with a learned constant helps CelebA-HQ more than FFHQ.
Noise inputs improve CelebA-HQ FID from 5.07 to 5.06.
Mixing regularization gives the best FFHQ FID and improves mixed-latent robustness.

Method Strengths and Weaknesses

Strengths

Intermediate $\mathcal{W}$ improves separability over direct $\mathcal{Z}$ input.
Per-layer styles expose coarse-to-fine semantic control.
Noise inputs isolate stochastic detail from global structure.
Full design cuts FFHQ FID from 8.04 to 4.40.

Weaknesses

Mixing regularization slightly worsens CelebA-HQ FID from 5.06 to 5.17.
Gains come from several coupled changes, not one clean intervention.
Evaluation centers on faces and two LSUN domains.
The method still depends on GAN optimization stability.

Suggestions from the authors

Analyze why the intermediate latent space improves disentanglement.
Study truncation tradeoffs between fidelity, variation, and coverage.
Characterize style control across coarse, middle, and fine scales.
Extend the architecture to broader image domains.

A Style-Based Generator Architecture for Generative Adversarial Networks

A Style-Based Generator Architecture for Generative Adversarial Networks

Problem

Framing

Currently Used Methods

Foundational

Proposed Method

Architecture

Loss / Objective

Sampling Rule / Algorithm

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers