RAPQ：两个低位训练后培训量化的拯救准确性

论文标题

RAPQ：两个低位训练后培训量化的拯救准确性

RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization

论文作者

Yao, Hongyi, Li, Pu, Cao, Jian, Liu, Xiangcheng, Xie, Chenying, Wang, Bingzhang

论文摘要

我们引入了满足硬件要求的深神经网络的两个低位训练后训练量化（PTQ）方法，并且不需要长期重新训练。两次量化的功率量化可以将量化和去除化引入的乘法转换为许多有效加速器采用的位移位。但是，两个量表因子的候选值较少，这会导致更多的舍入或剪辑错误。我们提出了一种新型的两个PTQ框架，称为RAPQ，该框架被动态调整了整个网络的两个尺度，而不是静态地确定它们一层。从理论上讲，它可以权衡整个网络的舍入错误和剪辑错误。同时，RAPQ中的重建方法基于每个单元的BN信息。对Imagenet的广泛实验证明了我们提出的方法的出色性能。如果没有铃铛和口哨声，REPQ与重量INT2激活INT4的RESNET-18和MOBILENETV2分别可以达到65％和48％的精度。我们是第一个针对低位PTQ的更受限制但对硬件友好的两个量化方案的建议，并证明它可以达到与SOTA PTQ方法几乎相同的准确性。该代码已发布。

We introduce a Power-of-Two low-bit post-training quantization(PTQ) method for deep neural network that meets hardware requirements and does not call for long-time retraining. Power-of-Two quantization can convert the multiplication introduced by quantization and dequantization to bit-shift that is adopted by many efficient accelerators. However, the Power-of-Two scale factors have fewer candidate values, which leads to more rounding or clipping errors. We propose a novel Power-of-Two PTQ framework, dubbed RAPQ, which dynamically adjusts the Power-of-Two scales of the whole network instead of statically determining them layer by layer. It can theoretically trade off the rounding error and clipping error of the whole network. Meanwhile, the reconstruction method in RAPQ is based on the BN information of every unit. Extensive experiments on ImageNet prove the excellent performance of our proposed method. Without bells and whistles, RAPQ can reach accuracy of 65% and 48% on ResNet-18 and MobileNetV2 respectively with weight INT2 activation INT4. We are the first to propose the more constrained but hardware-friendly Power-of-Two quantization scheme for low-bit PTQ specially and prove that it can achieve nearly the same accuracy as SOTA PTQ method. The code was released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题