Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

2015

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Problem

Framing

Deep nets were hard to optimize because each layer's input distribution drifted during training. The paper inserts a differentiable mini-batch normalization with learned scale and shift, enabling much larger learning rates and cutting ImageNet steps to 72.2%72.2\% accuracy from 31.010631.0 \cdot 10^6 to 13.310613.3 \cdot 10^6.

Currently Used Methods

Foundational

Proposed Method

Architecture

BatchNorm wraps an activation xx with mini-batch standardization, then restores representation power with learned γ\gamma and β\beta. In convolutional layers, one pair of moments is shared across all spatial positions in a feature map.

Algorithm 1: the batch-normalizing transform computes mini-batch mean, mini-batch variance, normalization, then learned scale and shift.

Loss / Objective

The task loss is unchanged; the method reparameterizes intermediate activations.

μB=1mi=1mxi,σB2=1mi=1m(xiμB)2\mu_B = \frac{1}{m}\sum_{i=1}^{m} x_i, \qquad \sigma_B^2 = \frac{1}{m}\sum_{i=1}^{m}(x_i-\mu_B)^2 x^i=xiμBσB2+ϵ,yi=γx^i+β\hat{x}_i = \frac{x_i-\mu_B}{\sqrt{\sigma_B^2+\epsilon}}, \qquad y_i = \gamma \hat{x}_i + \beta

Algorithm

Training uses batch moments; inference replaces them with frozen population estimates.

E[x]EB[μB],Var[x]mm1EB[σB2]\mathbb{E}[x] \leftarrow \mathbb{E}_B[\mu_B], \qquad \mathrm{Var}[x] \leftarrow \frac{m}{m-1}\,\mathbb{E}_B[\sigma_B^2] y=γVar[x]+ϵx+(βγE[x]Var[x]+ϵ)y = \frac{\gamma}{\sqrt{\mathrm{Var}[x]+\epsilon}} \cdot x + \left(\beta - \frac{\gamma\,\mathbb{E}[x]}{\sqrt{\mathrm{Var}[x]+\epsilon}}\right)

Algorithm 2: BN is inserted into selected activations during training, then replaced at inference by a fixed affine map using averaged population moments.

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers

1. Summary

Motivation / Problem

Prior Work and Its Limitations

Proposed Method

Hypothesis and Evaluation


2. Paper Strengths and Weakness

Strengths

Weaknesses


3. My Opinion

Overall Rating

Recommendation Justification

Detailed Comments