表征和理解可靠部署的量化模型的行为

论文标题

表征和理解可靠部署的量化模型的行为

Characterizing and Understanding the Behavior of Quantized Models for Reliable Deployment

论文作者

Hu, Qiang, Guo, Yuejun, Cordy, Maxime, Xie, Xiaofei, Ma, Wei, Papadakis, Mike, Traon, Yves Le

论文摘要

在过去的几十年中，由于自然语言建模，自动驾驶帮助和源代码的理解，深度神经网络（DNN）在过去几十年中引起了极大的关注。通过快速探索，已经提出了越来越复杂的DNN体系结构以及庞大的预训练模型参数。在用户友好的设备（例如，移动电话）中使用此类DNN模型的常见方法是在部署前执行模型压缩。但是，最近的研究表明，模型压缩，例如模型量化，在对看不见的数据进行测试时会产生准确性降解以及输出分歧。由于看不见的数据总是包括分配变化，并且经常出现在野外，因此无法确保量化模型的质量和可靠性。在本文中，我们进行了一项全面的研究，以表征和帮助用户了解量化模型的行为。我们的研究考虑了从图像到文本的4个数据集，8个DNN体系结构，包括前馈神经网络和经常性神经网络，以及42个转移的集合，以及合成和自然分布变化。结果表明，1）具有分配转移的数据比没有分配的分歧要多。 2）量化感知训练可以产生比标准，对抗和混合训练更稳定的模型。 3）分歧通常具有更接近的TOP-1和TOP-2输出概率，而$ Margin $比其他不确定性指标更好地指标来区分分歧。 4）与分歧分歧的重新培训在消除分歧方面的效率有限。我们将代码和模型开放为进一步研究量化模型的新基准。

Deep Neural Networks (DNNs) have gained considerable attention in the past decades due to their astounding performance in different applications, such as natural language modeling, self-driving assistance, and source code understanding. With rapid exploration, more and more complex DNN architectures have been proposed along with huge pre-trained model parameters. The common way to use such DNN models in user-friendly devices (e.g., mobile phones) is to perform model compression before deployment. However, recent research has demonstrated that model compression, e.g., model quantization, yields accuracy degradation as well as outputs disagreements when tested on unseen data. Since the unseen data always include distribution shifts and often appear in the wild, the quality and reliability of quantized models are not ensured. In this paper, we conduct a comprehensive study to characterize and help users understand the behaviors of quantized models. Our study considers 4 datasets spanning from image to text, 8 DNN architectures including feed-forward neural networks and recurrent neural networks, and 42 shifted sets with both synthetic and natural distribution shifts. The results reveal that 1) data with distribution shifts happen more disagreements than without. 2) Quantization-aware training can produce more stable models than standard, adversarial, and Mixup training. 3) Disagreements often have closer top-1 and top-2 output probabilities, and $Margin$ is a better indicator than the other uncertainty metrics to distinguish disagreements. 4) Retraining with disagreements has limited efficiency in removing disagreements. We opensource our code and models as a new benchmark for further studying the quantized models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题