Gradient-based learning applied to document recognition

Y. Lecun, L. Bottou, Y. Bengio, P. Haffner

1998 · Proceedings of the IEEE

Gradient-based learning applied to document recognition

Problem

Framing

OCR systems still depended on hand-built segmentation and features, which broke under shifts, distortions, and varied handwriting. The paper closes that gap with an end-to-end convolutional recognizer that learns features and classifier jointly from pixels, reaching about $0.8\%$ error on handwritten-digit recognition.

Currently Used Methods

Foundational

@rosenblattPerceptron1958 — trainable linear threshold classifier for pattern recognition.
- Limitation in context: no hierarchy, locality, or deformation tolerance.
@rumelhartLearningRepresentationsBackpropagating1986 — backpropagation enables multilayer feature learning.
- Limitation in context: dense MLPs ignore image geometry and over-parameterize.
A Theory of the Learnable — margin-based classification framework.
- Limitation in context: still depends on hand-crafted document features.
Gradient-Based Learning Applied to Handwritten Zip Code Recognition — earlier convolutional zip-code reader.
- Limitation in context: narrower scope than full document-recognition pipelines.

Proposed Method

Architecture

LeNet-5 takes a $32 \times 32$ grayscale image and alternates convolution with subsampling before two dense stages. The core widths are $\mathrm{C1}:6@28\times 28$ , $\mathrm{S2}:6@14\times 14$ , $\mathrm{C3}:16@10\times 10$ , $\mathrm{S4}:16@5\times 5$ , then $\mathrm{C5}:120$ , $\mathrm{F6}:84$ , and a 10-way output.

Verified architecture diagram: LeNet-5 with a 32x32 input, two convolution stages, two subsampling stages, then layers C5=120, F6=84, and a 10-class output.

Loss / Objective

The network trains by supervised gradient descent on output targets.

\mathcal{L}(\theta)=\frac{1}{N}\sum_{i=1}^{N}\ell\big(f_{\theta}(\mathbf{x}_i),y_i\big)

Algorithm

Inference is a single forward pass from pixels to class scores.

\hat{y}=\arg\max_{k} f_{\theta}(\mathbf{x})_k

Training Procedure

Input: $32 \times 32$ grayscale images.
Feature maps: $6 \rightarrow 16$ in the first two convolution blocks.
Hidden units: $120 \rightarrow 84$ before output.
Output classes: 10 for digit recognition.

Evaluation

Datasets

Handwritten digit recognition.
Check reading.
Document field recognition.

Metrics

Classification error rate ( $\%$ ).
End-to-end field recognition accuracy.

Headline results

Handwritten digits: about $0.8\%$ error.
K-NN Euclidean: $5.0\%$ error.
Deslanted K-NN Euclidean: $2.4\%$ error.
Retrieved comparison table shows LeNet variants below classical nearest-neighbor baselines.

Ablations

Local receptive fields vs dense MLP: fewer parameters and better image modeling.
Shared convolutions vs hand-built features: learned features improve recognition.
Distortion-aware training: robustness improves under writing variation.

Method Strengths and Weaknesses

Strengths

Replaces handcrafted OCR pipelines with end-to-end learning.
Weight sharing cuts parameters relative to dense image MLPs.
Architecture encodes translation tolerance through convolution and subsampling.
Reports about $0.8\%$ digit error, ahead of retrieved K-NN baselines.

Weaknesses

Retrieved text does not expose the exact printed loss formula.
Fixed $32 \times 32$ input constrains variable document layouts.
Shallow architecture limits representational depth.
Evaluation summary here is strongest on digits, weaker on broader documents.

Suggestions from the authors

Extend trainable recognition from isolated characters to full document fields.
Improve robustness to geometric distortion and handwriting variation.
Integrate segmentation, recognition, and language constraints jointly.
Scale convolutional readers to richer document structures.

1. Summary

Motivation / Problem

Traditional OCR / document recognition pipeline requires heavy hand-crafted preprocessing and manually designed features
Heavy Feature Engineering Process of ML approach

Prior Work and Its Limitations

ML (Hand-Engineered Feature Extractor + Classical ML Classifier)
- Feature Extractor + K-NN, PCA/quadratic methods, RBFs, SVMs
- Limitation
  - Hand-Engineered Feature Extractor needs domain-specific feature engineering --> Labor
  - Cannot handle shift, distortion (outliers)
MLP
- Couldn't capture 2d image's local structure.

Proposed Method

LeNet - Convolutional Neural Network
- Use Convolutional Neural Network for better auto feature extraction pipeline on the original gradient-based learning
- ![[@lecunGradientbasedLearningApplied1998_LeNet5.png]]
- CNN - ReLU - Avg Pool structure can catch local information that 2D image posses.
  - Original MLP cannot capture 2D image's local connectivity
- Gradient can flow not only on the classifier but end-to-end from feature extractor to classifier.

Hypothesis and Evaluation

Hypothesis
- LeNet can learn task-specific features jointly. + Can beat or match current methods (hand-crafted pipeline)
Evaluation
- Handwritten character / digit recognition benchmarks
- document-recognition system

2. Paper Strengths and Weakness

Strengths

Can learn feature without handcrafting
feature extraction and classifier combined
Can capture image structure through locality and weight sharing

Weaknesses

Model is shallow and can perform only simple tasks

3. My Opinion

Overall Rating

Strong Accept

Recommendation Justification

This paper plays a historically important role by presenting the first approach for image feature extraction in deep learning.

Detailed Comments

This piece is very important since it shifts the paradigm of handcrafted feature extraction pipeline to trainable end-to-end systems.

Gradient-based learning applied to document recognition

Gradient-based learning applied to document recognition

Problem

Framing

Currently Used Methods

Foundational

Proposed Method

Architecture

Loss / Objective

Algorithm

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers

1. Summary

Motivation / Problem

Prior Work and Its Limitations

Proposed Method

Hypothesis and Evaluation

2. Paper Strengths and Weakness

Strengths

Weaknesses

3. My Opinion

Overall Rating

Recommendation Justification

Detailed Comments