Instance Normalization: The Missing Ingredient for Fast Stylization
Instance Normalization: The Missing Ingredient for Fast Stylization
Problem
Framing
Feed-forward stylization was real-time but still trailed Gatys-quality transfer and degraded under larger training sets or longer optimization. The paper closes this gap with one architectural bias: replace batch normalization with instance normalization and keep it active at test time.
Currently Used Methods
Direct antecedents
- Image Style Transfer Using Convolutional Neural Networks — optimization-based perceptual style transfer with strongest visual quality.
- Limitation in context: several minutes per image, not real-time.
- Texture Networks: Feed-forward Synthesis of Textures and Stylized Images — feed-forward generators for fast fixed-style transfer.
- Limitation in context: quality degrades with many training images or long training.
- Perceptual Losses for Real-Time Style Transfer and Super-Resolution — residual feed-forward stylization with perceptual losses.
- Limitation in context: reproduced model still improves after swapping normalization.
- @ioffeBatchNormalizationAccelerating2015 — batch-wise activation normalization for CNN optimization.
- Limitation in context: batch statistics retain contrast the generator should remove.
Proposed Method
Architecture
The paper keeps the feed-forward generator and swaps every batch-normalization layer for instance normalization. It tests the change in both the earlier Ulyanov generator and a reproduced Johnson residual generator, with normalization still applied at inference.

Loss / Objective
Training keeps the fixed-style perceptual objective:
Normalization Rule
The key change is per-instance, per-channel spatial normalization:
Training Procedure
- Fixed style image per generator.
- Content images , with .
- Noise seeds .
- Same hyperparameters as the batch-normalized baselines.
- Instance normalization active at train and test time.
Evaluation
Datasets
- Fixed style images, one per trained generator.
- Natural content-image collections for training and test stylization.
- Qualitative examples on portraits and scenes.
Metrics
- Qualitative visual comparison.
- Comparison against Gatys optimization-based transfer.
- Comparison of batch normalization versus instance normalization.
Headline results
- Gatys baseline: several minutes per image.
- Ulyanov generator: instance normalization removes severe border artifacts after long training.
- Johnson residual generator: the same swap yields similar qualitative gains.
- Cross-architecture comparison: both generators improve with instance normalization.
- Runtime: real-time inference on standard GPU hardware.

Ablations
- Normalization type: batch normalization to instance normalization drives the main visual gain.
- Architecture family: gains persist in both Ulyanov and Johnson generators.
- Padding choice: better padding alone does not remove the dominant border artifacts.
- Training scale: many images or long training hurt the original batch-normalized generator.
Method Strengths and Weaknesses
Strengths
- One normalization swap yields the central quality improvement.
- Gains transfer across two generator architectures.
- Test-time normalization matches the contrast-removal hypothesis.
- Preserves single-pass, real-time stylization.
Weaknesses
- Evaluation is almost entirely qualitative.
- No quantitative stylization metric is reported.
- Scope is limited to fixed-style transfer.
- Training hyperparameters are sparsely specified.
Suggestions from the authors
- Test instance normalization in discriminative vision models.
- Analyze why contrast removal simplifies image generation.
- Apply the same normalization change to other generators.
Links
Prior Papers
- @ioffeBatchNormalizationAccelerating2015 — batch normalization is the normalization scheme this paper replaces throughout the generator.
Further Papers
- @wuGroupNormalization2018 — group normalization extends batch-size-independent normalization in vision models.