U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, Philipp Fischer, Thomas Brox

2015 · MICCAI

U-Net: Convolutional Networks for Biomedical Image Segmentation

Problem

Framing

Biomedical segmentation needs pixel-accurate boundaries with very few labeled images. U-Net closes this by pairing a symmetric encoder-decoder with skip connections and border-weighted loss, reaching EM warping error $0.000353$ and winning two ISBI 2015 cell-tracking datasets.

Currently Used Methods

Foundational

@lecunGradientbasedLearningApplied1998 — convolutional learning for end-to-end visual feature extraction.
- Limitation in context: classification outputs do not yield dense pixel labels.
"Fully Convolutional Networks for Semantic Segmentation" — dense prediction by replacing classifiers with upsampling.
- Limitation in context: weaker adaptation to tiny biomedical datasets and border precision.
"Deep neural networks segment neuronal membranes in electron microscopy images" — sliding-window CNN for pixelwise EM membrane prediction.
- Limitation in context: redundant patch evaluation is slow and weakens context-localization trade-offs.

Proposed Method

Architecture

U-Net uses a symmetric contracting and expansive path. Each down block applies two $3 \times 3$ valid convolutions, ReLU, and $2 \times 2$ max-pooling with channel doubling; each up block upsamples, applies a $2 \times 2$ up-convolution, concatenates cropped encoder features, and applies two $3 \times 3$ convolutions. The head is a $1 \times 1$ convolution, and the network has $23$ convolutional layers.

Loss / Objective

Training uses pixelwise softmax cross-entropy with a class-balancing, border-emphasizing weight map.

p_k(x) = \frac{\exp(a_k(x))}{\sum_{k'=1}^{K} \exp(a_{k'}(x))}

E = -\sum_{x \in \Omega} w(x) \, \log p_{\ell(x)}(x)

w(x) = w_c(x) + w_0 \exp\!\left(-\frac{(d_1(x)+d_2(x))^2}{2\sigma^2}\right)

Algorithm

Inference on large images uses overlap-tile prediction with mirrored border extrapolation so every output pixel has full valid-convolution context.

Training Procedure

Optimizer: stochastic gradient descent in Caffe
Batch size: $1$
Momentum: $0.99$
Augmentation: shifts, rotations, gray-value variations
Elastic deformation grid: $3 \times 3$
Elastic displacement std.: $10$ pixels
Dropout: end of the contracting path
Border-loss parameters: $w_0 = 10$ , $\sigma \approx 5$ pixels

Evaluation

Datasets

EM segmentation challenge: 30 training images, $512 \times 512$ , Drosophila VNC electron microscopy
ISBI cell tracking 2015 PhC-U373: phase-contrast microscopy
ISBI cell tracking 2015 DIC-HeLa: transmitted-light microscopy

Metrics

EM challenge: warping error
EM challenge: Rand error
EM challenge: pixel error
Cell tracking challenge: IoU

Headline results

EM challenge: warping error $0.0003529$ , Rand error $0.0382$
EM challenge vs. IDSIA: warping error $0.000420$ , Rand error $0.0504$
EM challenge u-net pixel error: $0.0611$
PhC-U373: IoU $0.9203$
DIC-HeLa: IoU $0.7756$

Ablations

Rotated test-time averaging: $7$ rotations improve the final EM submission.
Elastic deformations: identified as the key ingredient under few labels.
Border weighting: improves separation of touching cells.
Dataset-specific post-processing: only such methods beat U-Net on EM Rand error.

Method Strengths and Weaknesses

Strengths

Skip connections recover localization lost by repeated pooling.
Border-weighted loss directly targets touching-cell separation.
Strong results with only $30$ EM training images.
Beats IDSIA on EM warping and Rand error without post-processing.

Weaknesses

Valid convolutions shrink outputs and force overlap-tile inference.
Batch size $1$ reflects heavy memory pressure.
Results are concentrated on biomedical segmentation tasks.
Best EM Rand error still needs specialized post-processing beyond the base model.

Weaknesses

Valid convolutions shrink outputs and force overlap-tile inference.
Batch size $1$ reflects heavy memory pressure.
Results are concentrated on biomedical segmentation tasks.
Best EM Rand error still needs specialized post-processing beyond the base model.

Suggestions from the authors

Apply the model to more biomedical segmentation tasks.
Improve training with very few annotated images.
Refine separation of touching instances.
Extend seamless tiling to larger images under GPU limits.

U-Net: Convolutional Networks for Biomedical Image Segmentation

U-Net: Convolutional Networks for Biomedical Image Segmentation

Problem

Framing

Currently Used Methods

Foundational

Proposed Method

Architecture

Loss / Objective

Algorithm

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers