利用部分划痕彩票进行量化训练

论文标题

利用部分划痕彩票进行量化训练

Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training

论文作者

Zhong, Yunshan, Nan, Gongrui, Zhang, Yuxin, Chao, Fei, Ji, Rongrong

论文摘要

量化感知培训（QAT）在量化网络的性能中得到了广泛的欢迎。在QAT中，当代体验是所有量化权重都在整个培训过程中进行了更新。在本文中，根据我们观察到的有趣现象，这种经历受到挑战。具体而言，大部分量化权重达到了几个训练时期后达到最佳量化水平，我们称这是部分刮擦彩票。这种直截了当的可视化观察自然会激发我们在剩余的培训期间对这些权重的渐变计算，以避免毫无意义的更新。为了有效地找到票务，我们开发了一种启发式方法，称为彩票刮擦器（LTS），该方法一旦完整精确的ONE与其量化水平之间的距离冻结了一个重量，小于可控阈值。令人惊讶的是，拟议的LTS通常会消除50％-70％的重量更新和25％-35％的后退pass，同时仍会与比较基线相比具有相当的性能甚至更好的性能。例如，与基线相比，LTS将2位MobilenetV2提高了5.05％，消除了46％的重量更新和23％的向后通行证。代码在URL {https://github.com/zysxmu/lts}上。

Quantization-aware training (QAT) receives extensive popularity as it well retains the performance of quantized networks. In QAT, the contemporary experience is that all quantized weights are updated for an entire training process. In this paper, this experience is challenged based on an interesting phenomenon we observed. Specifically, a large portion of quantized weights reaches the optimal quantization level after a few training epochs, which we refer to as the partly scratch-off lottery ticket. This straightforward-yet-valuable observation naturally inspires us to zero out gradient calculations of these weights in the remaining training period to avoid meaningless updating. To effectively find the ticket, we develop a heuristic method, dubbed lottery ticket scratcher (LTS), which freezes a weight once the distance between the full-precision one and its quantization level is smaller than a controllable threshold. Surprisingly, the proposed LTS typically eliminates 50%-70% weight updating and 25%-35% FLOPs of the backward pass, while still resulting on par with or even better performance than the compared baseline. For example, compared with the baseline, LTS improves 2-bit MobileNetV2 by 5.05%, eliminating 46% weight updating and 23% FLOPs of the backward pass. Code is at url{https://github.com/zysxmu/LTS}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题