Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

2016 · IEEE

Deep Residual Learning for Image Recognition

Problem

Framing

Deeper CNNs showed higher training error once depth passed the easy-to-optimize regime. The paper closes this degradation gap by rewriting each block as residual learning, H(x)=F(x)+xH(\mathbf{x}) = F(\mathbf{x}) + \mathbf{x}, enabling 152-layer ImageNet models and a 3.57% test top-5 ensemble error.

Currently Used Methods

Foundational

Proposed Method

Architecture

The network replaces plain stacks with residual blocks that add a shortcut to a learned branch. ResNet-18/34 use two 3×33 \times 3 layers per block. ResNet-50/101/152 use a bottleneck 1×13×31×11 \times 1 \rightarrow 3 \times 3 \rightarrow 1 \times 1 block, with identity shortcuts when dimensions match and projection shortcuts when they do not.

Residual block variants for ImageNet: left is the two-layer basic block on 56\times56 feature maps; right is the bottleneck block with 1\times1, 3\times3, and 1\times1 convolutions plus shortcut addition.

Loss / Objective

The paper keeps the standard classification loss and changes the block parameterization:

y=F(x,{Wi})+x\mathbf{y} = F(\mathbf{x}, \{W_i\}) + \mathbf{x}

For dimension mismatch, the shortcut becomes:

y=F(x,{Wi})+Wsx\mathbf{y} = F(\mathbf{x}, \{W_i\}) + W_s \mathbf{x}

Algorithm

Each unit computes a residual branch, adds the shortcut, then applies ReLU:

xl+1=ReLU(F(xl,Wl)+xl)\mathbf{x}_{l+1} = \mathrm{ReLU}\left(F(\mathbf{x}_l, W_l) + \mathbf{x}_l\right)

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers

1. Summary

Motivation / Problem

Prior Work and Its Limitations

Proposed Method

Hypothesis and Evaluation


2. Paper Strengths and Weakness

Strengths

Weaknesses


3. My Opinion

Overall Rating

Recommendation Justification

Detailed Comments