DS1 spectrogram: StatQAT: Statistical Quantizer Optimization for Deep Networks

StatQAT: Statistical Quantizer Optimization for Deep Networks

2605.17745

Authors

Mehmet Aktukmak,Daniel Huang,Ke Ding

Abstract

Quantization is essential for reducing the computational cost and memory usage of deep neural networks, enabling efficient inference on low-precision hardware. Despite the growing adoption of uniform and floating-point quantization schemes, selecting optimal quantization parameters remains a key challenge, particularly for diverse data distributions encountered during training and inference.

This work presents a novel statistical error analysis framework for uniform and floating-point quantization, providing theoretical insight into error behavior across quantization configurations. Building on this analysis, we propose iterative quantizers designed for arbitrary data distributions and analytic quantizers tailored for Gaussian-like weight distributions.

These methods enable efficient, low-error quantization suitable for both activations and weights. We incorporate our quantizers into quantization-aware training and evaluate them across integer and floating-point formats.

Experiments demonstrate improved accuracy and stability, highlighting the effectiveness of our approach for training low-precision neural networks.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.