Classifier-Free Diffusion Guidance
Classifier-Free Diffusion Guidance
Problem
Framing
Classifier guidance gives diffusion models a fidelity–diversity control knob, but it needs a separate noisy-image classifier and classifier gradients at sampling. This paper removes that dependency by training one denoiser with conditioning dropout, then mixing its conditional and unconditional predictions at test time.
Currently Used Methods
Foundational and direct antecedents
- @DenoisingDiffusionProbabilisticModels2020 — denoising objective and ancestral reverse sampler.
- Limitation in context: no classifier-free guidance rule.
- @songScoreSDE2020 — continuous-time score-based diffusion formulation.
- Limitation in context: conditional steering still uses external guidance.
- @dhariwalDiffusionBeatGANs2021 — classifier guidance for sharper class-conditional samples.
- Limitation in context: requires a noisy classifier and gradients.
- @nicholImprovedDDPM2021 — improved diffusion parameterization and faster sampling.
- Limitation in context: does not remove classifier dependence.
- @ronnebergerUNet2015 — U-Net backbone for denoising networks.
- Limitation in context: no null-conditioning scheme by itself.
Proposed Method
Architecture
The method keeps the class-conditional diffusion U-Net from @dhariwalDiffusionBeatGANs2021. Training drops the class label to a null token with probability , so one network learns both conditional and unconditional denoisers.

Loss / Objective
Training uses the standard continuous-time -prediction loss with conditioning dropout.
Sampling Rule / Algorithm
Sampling replaces classifier gradients with a linear interpolation of the two score estimates.
Training Procedure
- .
- .
- , .
- Sampling steps .
- FID and IS use samples.
- Reuses the base architecture and most hyperparameters of @dhariwalDiffusionBeatGANs2021.
Evaluation
Datasets
- ImageNet , class-conditional.
- ImageNet , class-conditional.
Metrics
- FID.
- Inception Score.
Headline results
- ImageNet (): FID 1.48, IS 67.95.
- ImageNet (): FID 12.6, IS 170.1.
- ImageNet (): FID 24.83, IS 250.4.
- ImageNet (): FID 7.86, IS 297.98.
- ImageNet (): FID 21.53, IS 421.03.
Table 1: Baseline comparison on ImageNet .
| Model | FID () | IS () |
|---|---|---|
| BigGAN-deep, max IS (Brock et al., 2019) | 25 | 253 |
| BigGAN-deep (Brock et al., 2019) | 5.7 | 124.5 |
| CDM (Ho et al., 2021) | 3.52 | 128.8 |
| LOGAN (Wu et al., 2019) | 3.36 | 148.2 |
| ADM-G (Dhariwal & Nichol, 2021) | 2.97 | - |

Ablations
- Guidance strength : higher raises IS, then worsens FID.
- Dropout : larger values tolerate stronger guidance.
- Sampler steps : more steps improve quality; is the trade-off.
- Strong guidance: samples become sharper, more saturated, and more repetitive.
Method Strengths and Weaknesses
Strengths
- Removes the auxiliary classifier and its extra training pipeline.
- Adds one sampling knob for fidelity–diversity control.
- Reuses existing class-conditional diffusion architectures unchanged.
- Works on both and ImageNet.
Weaknesses
- Large improves IS while degrading FID.
- Strong guidance yields saturated colors and repeated motifs.
- Quality depends on tuning both and .
- The paper gives little theory for the interpolation rule.
Suggestions from the authors
- Derive why classifier-free interpolation improves classifier-based metrics.
- Tune architectures and hyperparameters specifically for classifier-free guidance.
- Test the method in other conditional diffusion settings.
- Study why pure generative diffusion models maximize classifier-based metrics.
Links
Prior Papers
- @DeepUnsupervisedLearningusing2015 — early diffusion formulation behind the forward and reverse noising view.
- @DenoisingDiffusionImplicitModels2020 — related diffusion sampler family for faster generation.
- @DenoisingDiffusionProbabilisticModels2020 — standard denoising objective and reverse-process parameterization reused here.
- @dhariwalDiffusionBeatGANs2021 — direct baseline; classifier-free guidance removes its auxiliary classifier.
- @nicholImprovedDDPM2021 — improved diffusion parameterization and sampling setup used in comparison.
- @ronnebergerUNet2015 — backbone template for the denoising network.
- @songScoreSDE2020 — continuous-time score formulation aligned with the paper's notation.
Further Papers
- @rombachLatentDiffusion2022 — applies diffusion guidance ideas inside a more scalable latent-space generator.