Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

2015 · ICCV

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Problem

Framing

Deep rectifier CNNs still fail at depth because Xavier scaling assumes symmetric responses and underestimates variance loss after ReLU. The paper closes this with a learned rectifier, PReLU, and a rectifier-aware initialization rule, then reports 5.71% top-5 single-model error on ImageNet.

Currently Used Methods

Foundational

Proposed Method

Architecture

The paper keeps standard CNN backbones and changes only the activation. PReLU learns a negative slope aia_i per channel or per layer, adding negligible parameters relative to convolution weights. The learned slopes are larger in early layers and smaller in deep layers.

Convergence plot for a 22-layer ImageNet model: the proposed rectifier-aware initialization reduces top-1 error earlier than Xavier, while both eventually converge.

Loss / Objective

The modeling change is the learned rectifier itself.

f(yi)=max(0,yi)+aimin(0,yi)f(y_i) = \max(0, y_i) + a_i \min(0, y_i)

Algorithm

The initialization preserves variance across rectifier layers.

12nlVar[wl]=1\frac{1}{2} n_l \operatorname{Var}[w_l] = 1

For PReLU with initial slope aa:

12(1+a2)nlVar[wl]=1\frac{1}{2}(1+a^2) n_l \operatorname{Var}[w_l] = 1

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Table 6: ReLU vs PReLU on ImageNet validation for model A at different test scales.

model A scale ssReLU top-1ReLU top-5PReLU top-1PReLU top-5
25626.258.2525.818.08
38424.777.2624.207.03
48025.467.6324.837.39
multi-scale24.026.5122.976.28

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers