← Infrastructure Quantization
Infrastructure

Calibration and Accuracy Recovery

Calibration and accuracy recovery techniques (such as AdaRound and AdaQuant) optimize exactly how floating-point weights are mathematically rounded into

Source: mortalapps.com
TL;DR
  • Calibration and accuracy recovery techniques (such as AdaRound and AdaQuant) optimize exactly how floating-point weights are mathematically rounded into discrete low-precision integer bins.
  • The core purpose is to minimize the layer-wise output error introduced by aggressive quantization without requiring costly, full end-to-end retraining.
  • The primary optimization involves framing the rounding decision as a Quadratic Unconstrained Binary Optimization (QUBO) problem or utilizing a progressive optimization strategy.
  • The essential engineering insight is that simple round-to-nearest logic is sub-optimal; forcing a weight to round away from its mathematically nearest integer can cancel out the quantization errors of adjacent weights, preserving the overall layer output perfectly.

Why This Matters

When an LLM is crushed into 4-bit representation, the discrete bins are so wide that standard "round-to-nearest" algorithms introduce massive mathematical perturbations across the tensor. Forcing the network to adapt its rounding direction offline rescues the model from systemic accuracy collapse, recovering near FP32 or BF16 performance using less than,000 unlabeled calibration images or text sequences. This process makes post-training quantization (PTQ) highly robust, eliminating the need for expensive Quantization-Aware Training (QAT).

Core Intuition

Imagine balancing a physical scale. If you round two weights "up" to the nearest integer, the scale tips heavily to the right, introducing a bias. But if you intelligently round one weight "up" and force the other weight "down" (even if it is slightly further away from that lower integer), the total scale remains perfectly balanced. AdaRound utilizes a small calibration dataset to measure the total output of the entire layer, adjusting the individual rounding directions to ensure the layer's final output is identical to the unquantized version.

Technical Deep Dive

AdaRound approximates the task loss via a mathematical Taylor series expansion, posing the rounding task strictly as a Quadratic Unconstrained Binary Optimization (QUBO) problem.

It optimizes a loss function based on the Mean Squared Error (MSE):

.71 Instead of relying on the standard Straight-Through Estimator (STE)—which suffers from massive gradient mismatch at extreme low bit-widths—AdaRound learns a continuous variable that maps cleanly to the binary rounding decision (floor or ceiling). AdaQuant / AE-Qdrop introduces a Progressive Optimization Strategy (POS). Initially, activations are quantized but weights remain as floating-point values. The weights absorb the perturbation algebraically: . Only then is the weight rounding direction optimized by setting strict upper and lower bounds.

Key Takeaways

Standard round-to-nearest algorithms are mathematically sub-optimal for low-bit quantization.
AdaRound formulates rounding as a distinct layer-wise MSE optimization problem.
Completely avoids the gradient mismatch errors of the Straight-Through Estimator.
Requires only ~1,000 unlabeled calibration samples.
Can be executed entirely offline in hours without retraining the foundational model.