Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford, Luke Metz, Soumith Chintala

2015 · ICLR

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Problem

Framing

GANs produced sharp samples, but deeper CNN GANs were unstable and their learned features were under-validated. DCGAN closes this with a constrained all-convolutional architecture, batch normalization, and stable Adam settings that yield 82.8%82.8\% CIFAR-10 accuracy from discriminator features.

Currently Used Methods

Foundational

Proposed Method

Architecture

DCGAN removes pooling and hidden fully connected layers. The generator maps zR100\mathbf{z} \in \mathbb{R}^{100} to 64×64×364 \times 64 \times 3 through four fractionally strided convolutions; the discriminator mirrors this with strided convolutions. The generator uses ReLU and output tanh\tanh; the discriminator uses LeakyReLU and batch normalization.

Verified architecture diagram: a 100-D latent vector is projected to a 4 \times 4 \times 1024 tensor, then upsampled through four stride-2 convolution blocks to a 64 \times 64 \times 3 image.

Loss / Objective

The model keeps the standard GAN minimax game.

minGmaxDV(D,G)=Expdata[logD(x)]+Ezpz[log(1D(G(z)))]\min_G \max_D \, V(D,G) = \mathbb{E}_{\mathbf{x} \sim p_{\mathrm{data}}}\left[\log D(\mathbf{x})\right] + \mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}}\left[\log \left(1 - D(G(\mathbf{z}))\right)\right]

Algorithm

Training alternates discriminator and generator updates under the adversarial objective.

zUniform([1,1]100),x^=G(z),D(x),D(x^) drive the two-player update\mathbf{z} \sim \mathrm{Uniform}([-1,1]^{100}), \qquad \hat{\mathbf{x}} = G(\mathbf{z}), \qquad D(\mathbf{x}), D(\hat{\mathbf{x}}) \text{ drive the two-player update}

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Table 1: CIFAR-10 classification using pretrained discriminator features

ModelAccuracyAccuracy (400 per class)max # of features units
1 Layer K-means80.6%63.7% (±0.7%\pm 0.7\%)4800
3 Layer K-means Learned RF82.0%70.7% (±0.7%\pm 0.7\%)3200
View Invariant K-means81.9%72.6% (±0.7%\pm 0.7\%)6400
Exemplar CNN84.3%77.4% (±0.2%\pm 0.2\%)1024
DCGAN (ours) + L2-SVM82.8%73.8% (±0.4%\pm 0.4\%)512

Sample grid: LSUN bedroom generations with coherent room layout, windows, beds, and lighting across many draws.

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers