QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers, Artidoro Pagnoni

2023 · NeurIPS

QLoRA: Efficient Finetuning of Quantized LLMs

Problem

Framing

Finetuning 33B–65B LLMs is memory-bound: 16-bit tuning of LLaMA-65B needs more than 780 GB, and inference quantizers fail under backpropagation. QLoRA closes this gap by freezing a 4-bit base model, training LoRA adapters through dequantized weights, and fitting 65B finetuning under 48 GB without losing reported 16-bit task performance.

Currently Used Methods

Foundational

Proposed Method

Architecture

QLoRA stores the pretrained transformer in 4-bit NF4, dequantizes weights to BF16 for compute, and trains only LoRA adapters. It adds double-quantized scaling constants and paged optimizers, and places adapters in all layers to recover full-finetuning accuracy.

Verified figure: memory-layout comparison of full finetuning, LoRA, and QLoRA; QLoRA uses a 4-bit transformer, LoRA adapters, and CPU paging for optimizer-state spikes.

Loss / Objective

The trainable map is the LoRA-augmented frozen linear layer.

Y=XW+sXL1L2\mathbf{Y} = \mathbf{X}\mathbf{W} + s\,\mathbf{X}\mathbf{L}_1\mathbf{L}_2

Quantization Rule

NF4 chooses levels from normal-distribution quantiles, then computes with doubly dequantized frozen weights.

qi=12(QN ⁣(i2k+1)+QN ⁣(i+12k+1))q_i = \frac{1}{2}\left(Q_{\mathcal{N}}\!\left(\frac{i}{2^k+1}\right) + Q_{\mathcal{N}}\!\left(\frac{i+1}{2^k+1}\right)\right) YBF16=XBF16doubleDequant ⁣(c1FP32,c2k-bit,WNF4)+XBF16L1BF16L2BF16\mathbf{Y}^{\mathrm{BF16}} = \mathbf{X}^{\mathrm{BF16}}\,\operatorname{doubleDequant}\!\left(c_1^{\mathrm{FP32}}, c_2^{k\text{-bit}}, \mathbf{W}^{\mathrm{NF4}}\right) + \mathbf{X}^{\mathrm{BF16}}\mathbf{L}_1^{\mathrm{BF16}}\mathbf{L}_2^{\mathrm{BF16}}

Training Procedure

Evaluation

Datasets

Metrics

Headline results

Table 1: Mean zero-shot accuracy over tasks and model scales for 4-bit LLaMA variants

Data type
Float
NF4
NF4 + DQ

Ablations

Method Strengths and Weaknesses

Strengths

Weaknesses

Suggestions from the authors

Links

Prior Papers

Further Papers

No vault papers identified as further work yet.