U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, Philipp Fischer, Thomas Brox

2015 · MICCAI

U-Net: Convolutional Networks for Biomedical Image Segmentation

Problem

Framing

Biomedical segmentation needs pixel-accurate boundaries with very few labeled images. U-Net closes this by pairing a symmetric encoder-decoder with skip connections and border-weighted loss, reaching EM warping error 0.0003530.000353 and winning two ISBI 2015 cell-tracking datasets.

Currently Used Methods

Foundational

Proposed Method

Architecture

U-Net uses a symmetric contracting and expansive path. Each down block applies two 3×33 \times 3 valid convolutions, ReLU, and 2×22 \times 2 max-pooling with channel doubling; each up block upsamples, applies a 2×22 \times 2 up-convolution, concatenates cropped encoder features, and applies two 3×33 \times 3 convolutions. The head is a 1×11 \times 1 convolution, and the network has 2323 convolutional layers.

Loss / Objective

Training uses pixelwise softmax cross-entropy with a class-balancing, border-emphasizing weight map.

pk(x)=exp(ak(x))k=1Kexp(ak(x))p_k(x) = \frac{\exp(a_k(x))}{\sum_{k'=1}^{K} \exp(a_{k'}(x))} E=xΩw(x)logp(x)(x)E = -\sum_{x \in \Omega} w(x) \, \log p_{\ell(x)}(x) w(x)=wc(x)+w0exp ⁣((d1(x)+d2(x))22σ2)w(x) = w_c(x) + w_0 \exp\!\left(-\frac{(d_1(x)+d_2(x))^2}{2\sigma^2}\right)

Algorithm

Inference on large images uses overlap-tile prediction with mirrored border extrapolation so every output pixel has full valid-convolution context.

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers