复制即访问：视力变压器训练后量化的比例重新聚集

论文标题

复制即访问：视力变压器训练后量化的比例重新聚集

RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

论文作者

Li, Zhikai, Xiao, Junrui, Yang, Lianwei, Gu, Qingyi

论文摘要

训练后量化（PTQ）仅需要一个微小的数据集即可进行校准，而无需端到端重新培训，是一种轻巧而实用的模型压缩技术。最近，已经提出了几种用于视觉变压器（VIT）的PTQ方案。不幸的是，它们通常会遭受非平凡的准确性降解，尤其是在低位案例中。在本文中，我们提出了Repq-Vit，这是一种基于量化标度重新计量化的VIT的新型PTQ框架，以解决上述问题。 Repq-Vit将量化和推理过程分解，前者采用复杂的量化器，后者采用了尺度重新分配的简化量化器。这样可以确保准确的量化和有效的推论，这将其与牺牲量化性能以满足目标硬件的现有方法区分开。更具体地说，我们专注于具有极端分布的两个组件：具有严重通道间变化的层后激活和具有幂律特征的柔软后激活，并最初应用频道量化和log $ \ sqrt {2} $量化。然后，我们将量表对硬件友好的图层量化和log2量化进行重新聚集，以进行推理，仅需略有准确性或计算成本。对具有不同模型变体的多个视觉任务进行了广泛的实验，证明没有超参数和昂贵的重建程序的复制可以超越现有的强基础，并鼓励将4位PTQ VIT的准确性提高到可用的水平。代码可在https://github.com/zkkli/repq-vit上找到。

Post-training quantization (PTQ), which only requires a tiny dataset for calibration without end-to-end retraining, is a light and practical model compression technique. Recently, several PTQ schemes for vision transformers (ViTs) have been presented; unfortunately, they typically suffer from non-trivial accuracy degradation, especially in low-bit cases. In this paper, we propose RepQ-ViT, a novel PTQ framework for ViTs based on quantization scale reparameterization, to address the above issues. RepQ-ViT decouples the quantization and inference processes, where the former employs complex quantizers and the latter employs scale-reparameterized simplified quantizers. This ensures both accurate quantization and efficient inference, which distinguishes it from existing approaches that sacrifice quantization performance to meet the target hardware. More specifically, we focus on two components with extreme distributions: post-LayerNorm activations with severe inter-channel variation and post-Softmax activations with power-law features, and initially apply channel-wise quantization and log$\sqrt{2}$ quantization, respectively. Then, we reparameterize the scales to hardware-friendly layer-wise quantization and log2 quantization for inference, with only slight accuracy or computational costs. Extensive experiments are conducted on multiple vision tasks with different model variants, proving that RepQ-ViT, without hyperparameters and expensive reconstruction procedures, can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level. Code is available at https://github.com/zkkli/RepQ-ViT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题