ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

2012 · NeurIPS

ImageNet Classification with Deep Convolutional Neural Networks

Problem

Framing

Large-scale image classification had ImageNet-scale data, but CNNs had not shown effective training on 1.21.2M high-resolution images and 10001000 classes. The paper closes that gap with a deep GPU-trained CNN using ReLUs, augmentation, and dropout, reaching 17.0%17.0\% top-5 error on ILSVRC-2010.

Currently Used Methods

Foundational

Proposed Method

Architecture

The network has five convolutional layers and three fully connected layers, with ReLU after every learned layer and a 10001000-way softmax head. Input is a 224×224×3224 \times 224 \times 3 crop. The conv stack is 9696 11×1111 \times 11 stride-44, 256256 5×55 \times 5, then 384384, 384384, 256256 with 3×33 \times 3 kernels; the two dense hidden layers have 40964096 units each. The model is split across two GPUs with limited cross-GPU connections.

Architecture diagram: the AlexNet CNN split across two GPUs, with five convolutional stages, max-pooling, two dense hidden layers, and a 1000-way output.

Loss / Objective

The model maximizes multinomial logistic regression over ImageNet classes.

L(θ)=1Ni=1Nlogpθ(yixi)\mathcal{L}(\theta) = \frac{1}{N} \sum_{i=1}^{N} \log p_{\theta}(y_i \mid \mathbf{x}_i)

Algorithm

Test-time prediction averages softmax outputs over ten crops.

p^(yx)=110cC10pθ(yc(x))\hat{p}(y \mid \mathbf{x}) = \frac{1}{10} \sum_{c \in \mathcal{C}_{10}} p_{\theta}(y \mid c(\mathbf{x}))

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Table 1: ILSVRC-2010 test error comparison

ModelTop-1Top-5
Sparse coding [2]47.1%28.2%
SIFT + FVs [24]45.7%25.7%
CNN37.5%17.0%

Ablations

Sample grid: first-layer convolutional filters, with many grayscale edge detectors and a few color-sensitive kernels.

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers