Gradient-based learning applied to document recognition

Y. Lecun, L. Bottou, Y. Bengio, P. Haffner

1998 · Proceedings of the IEEE

Gradient-based learning applied to document recognition

Problem

Framing

OCR systems still depended on hand-built segmentation and features, which broke under shifts, distortions, and varied handwriting. The paper closes that gap with an end-to-end convolutional recognizer that learns features and classifier jointly from pixels, reaching about 0.8%0.8\% error on handwritten-digit recognition.

Currently Used Methods

Foundational

Proposed Method

Architecture

LeNet-5 takes a 32×3232 \times 32 grayscale image and alternates convolution with subsampling before two dense stages. The core widths are C1:6@28×28\mathrm{C1}:6@28\times 28, S2:6@14×14\mathrm{S2}:6@14\times 14, C3:16@10×10\mathrm{C3}:16@10\times 10, S4:16@5×5\mathrm{S4}:16@5\times 5, then C5:120\mathrm{C5}:120, F6:84\mathrm{F6}:84, and a 10-way output.

Verified architecture diagram: LeNet-5 with a 32x32 input, two convolution stages, two subsampling stages, then layers C5=120, F6=84, and a 10-class output.

Loss / Objective

The network trains by supervised gradient descent on output targets.

L(θ)=1Ni=1N(fθ(xi),yi)\mathcal{L}(\theta)=\frac{1}{N}\sum_{i=1}^{N}\ell\big(f_{\theta}(\mathbf{x}_i),y_i\big)

Algorithm

Inference is a single forward pass from pixels to class scores.

y^=argmaxkfθ(x)k\hat{y}=\arg\max_{k} f_{\theta}(\mathbf{x})_k

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers

1. Summary

Motivation / Problem

Prior Work and Its Limitations

Proposed Method

Hypothesis and Evaluation


2. Paper Strengths and Weakness

Strengths

Weaknesses


3. My Opinion

Overall Rating

Recommendation Justification

Detailed Comments