Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Problem
Framing
GANs produced sharp samples, but deeper CNN GANs were unstable and their learned features were under-validated. DCGAN closes this with a constrained all-convolutional architecture, batch normalization, and stable Adam settings that yield CIFAR-10 accuracy from discriminator features.
Currently Used Methods
Foundational
- @goodfellowGAN2014 — adversarial learning with a generator and discriminator.
- Limitation in context: vanilla MLP GANs did not train deep convolutional image models stably.
- @krizhevskyAlexNet2012 — deep convolutional design for strong supervised visual features.
- Limitation in context: supervised CNN heuristics did not directly stabilize adversarial co-training.
- "Striving for Simplicity: The All Convolutional Net" — replaces pooling with learned strided convolutions.
- Limitation in context: it does not address generator–discriminator optimization dynamics.
- "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" — batch normalization stabilizes deep optimization.
- Limitation in context: its placement inside GANs was not yet established.
- "Discriminative Unsupervised Feature Learning with Convolutional Neural Networks" — unsupervised CNN features transfer to classification.
- Limitation in context: generative training had not matched that representation quality.
Proposed Method
Architecture
DCGAN removes pooling and hidden fully connected layers. The generator maps to through four fractionally strided convolutions; the discriminator mirrors this with strided convolutions. The generator uses ReLU and output ; the discriminator uses LeakyReLU and batch normalization.

Loss / Objective
The model keeps the standard GAN minimax game.
Algorithm
Training alternates discriminator and generator updates under the adversarial objective.
Training Procedure
- Latent prior: .
- Images scaled to .
- Optimizer: Adam.
- Learning rate: .
- Batch size: .
- Momentum: .
- Weight init: normal, std .
Evaluation
Datasets
- LSUN bedrooms.
- ImageNet-1k, center crops.
- Faces dataset from 10K identities.
- CIFAR-10.
- SVHN.
- MNIST for nearest-neighbor analysis.
Metrics
- CIFAR-10 classification accuracy.
- CIFAR-10 accuracy with 400 labels per class.
- SVHN error rate with 1000 labels.
- MNIST nearest-neighbor test error.
- Qualitative sample inspection.
Headline results
- CIFAR-10, ImageNet-pretrained features: accuracy.
- CIFAR-10, 400 labels per class: .
- SVHN, 1000 labels: error.
- MNIST, M generated samples: nearest-neighbor test error.
Table 1: CIFAR-10 classification using pretrained discriminator features
| Model | Accuracy | Accuracy (400 per class) | max # of features units |
|---|---|---|---|
| 1 Layer K-means | 80.6% | 63.7% () | 4800 |
| 3 Layer K-means Learned RF | 82.0% | 70.7% () | 3200 |
| View Invariant K-means | 81.9% | 72.6% () | 6400 |
| Exemplar CNN | 84.3% | 77.4% () | 1024 |
| DCGAN (ours) + L2-SVM | 82.8% | 73.8% () | 512 |

Ablations
- Pooling removal: learned strided convolutions improve training stability.
- Fully connected removal: deeper GANs train more reliably.
- Momentum sweep: oscillates; stabilizes.
- Extended training: some filters collapse into oscillatory modes.
Method Strengths and Weaknesses
Strengths
- Architectural rules are simple and reproducible.
- Discriminator features reach CIFAR-10 accuracy without CIFAR pretraining.
- Few-label transfer is strong: with 400 labels per class.
- LSUN samples show consistent global room structure at .
Weaknesses
- Training still shows oscillation and occasional filter collapse.
- No calibrated generative metric like FID or likelihood is reported.
- Full-label CIFAR-10 trails Exemplar CNN.
- Design rules are empirical, not derived from GAN optimization theory.
Suggestions from the authors
- Extend the approach to video frame prediction.
- Extend learned features to audio and speech synthesis.
- Study latent-space structure more systematically.
- Develop vector arithmetic for conditional generation with less data.
Links
Prior Papers
- @goodfellowGAN2014 — introduces the adversarial objective that DCGAN stabilizes with convolutional design.
- @krizhevskyAlexNet2012 — supplies CNN design motifs that DCGAN adapts to unsupervised adversarial training.
Further Papers
- @karrasStyleGAN2019 — extends GAN generator design toward finer architectural control and higher-fidelity synthesis.