Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Problem
Framing
Region-based detectors were accurate but proposal generation still cost seconds per image. Faster R-CNN replaces hand-engineered proposals with an RPN that shares convolutional features with Fast R-CNN, reducing proposal time to about and reaching mAP on VOC 2007 with VGG-16.
Currently Used Methods
Foundational
- @krizhevskyAlexNet2012 — deep CNN features make large-scale recognition practical.
- Limitation in context: not a detector or proposal mechanism.
- "Rich feature hierarchies for accurate object detection and semantic segmentation" — R-CNN scores warped external proposals.
- Limitation in context: proposal generation and per-region CNN passes are slow.
- "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition" — shares convolutional features across regions.
- Limitation in context: still relies on external proposal algorithms.
- "Fast R-CNN" — RoI pooling and shared features speed region classification.
- Limitation in context: Selective Search remains the runtime bottleneck.
- "OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks" — sliding-window detection with joint classification and regression.
- Limitation in context: one-stage localization trails two-stage accuracy by about mAP.
Proposed Method
Architecture
The model places an RPN on the shared convolutional map and reuses those features for Fast R-CNN detection. A sliding window feeds sibling heads for objectness and box regression over anchors from scales and aspect ratios.

Loss / Objective
The RPN optimizes joint anchor classification and box regression.
Sampling Rule / Algorithm
Anchor labels and proposal pruning are set by IoU thresholds and NMS.
Training Procedure
- Anchors per location: .
- Anchor scales: .
- Anchor aspect ratios: .
- RPN mini-batch: anchors per image.
- Positive fraction: up to positives.
- Proposal NMS IoU: .
- Proposals after NMS: train, test.
- Feature-sharing optimization: 4-step alternating training.
Evaluation
Datasets
- PASCAL VOC 2007
- PASCAL VOC 2012
- MS COCO
Metrics
- mAP on VOC
- mAP@0.5 on COCO
- mAP@[0.5,0.95] on COCO
- Proposal time
- End-to-end inference rate
Headline results
- VOC 2007 test (VGG-16, 07+12): mAP.
- VOC 2007 test (VGG-16, 07): mAP.
- VOC 2012 test (VGG-16, 07++12): mAP.
- COCO test-dev (VGG-16): mAP@0.5.
- VGG-16 runtime: about fps end-to-end; proposals cost about .
Table 1: Proposal methods under Fast R-CNN on VOC 2007 test set
| train-time region proposals method | # boxes | test-time region proposals method | # proposals | mAP (%) |
|---|---|---|---|---|
| SS | 2000 | SS | 2000 | 58.7 |
| EB | 2000 | EB | 2000 | 58.6 |
| RPN+ZF, shared | 2000 | RPN+ZF, shared | 300 | 59.9 |
Ablations
- Anchor design: scales and aspect ratios reaches mAP.
- Single-anchor setting: mAP drops by points.
- Proposal count: top RPN proposals already remain competitive.
- Shared features: proposal quality improves from RPN+ZF to RPN+VGG.
Method Strengths and Weaknesses
Strengths
- Eliminates external proposals, reducing proposal time to about .
- Shared convolutional features preserve accuracy while removing duplicated compute.
- Reaches mAP on VOC 2007 with VGG-16.
- High-quality proposals need only boxes at test time.
Weaknesses
- Four-step alternating training is cumbersome versus true joint optimization.
- Performance depends on anchor scales, ratios, and IoU heuristics.
- VGG-16 inference is only about fps.
- Two-stage design is less direct than single-stage detectors.
Suggestions from the authors
- Replace alternating optimization with approximate joint training.
- Improve proposal quality with deeper shared backbones.
- Extend learned proposal mechanisms beyond object detection.
- Handle wider scale and aspect-ratio variation with stronger proposal designs.
Links
Prior Papers
- @krizhevskyAlexNet2012 — establishes the CNN feature regime that later region-based detectors exploit.
Further Papers
- @redmonYOLO2016 — explores the competing single-stage design point for higher detection speed.
- @heMaskRCNN2017 — extends Faster R-CNN with a mask branch for instance segmentation.