OPQ：通过一次性修剪量化压缩深神经网络

论文标题

OPQ：通过一次性修剪量化压缩深神经网络

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

论文作者

Hu, Peng, Peng, Xi, Zhu, Hongyuan, Aly, Mohamed M. Sabry, Lin, Jie

论文摘要

由于深神经网络（DNN）通常被过度参数化并具有数百万个重量参数，因此在资源受限的硬件平台（例如智能手机）上部署这些大型DNN模型是一项挑战。提出了许多网络压缩方法，例如修剪和量化，以大大减少模型大小，其中关键是找到每一层的合适的压缩分配（例如，修剪稀疏和量化代码簿）。现有的解决方案以迭代/手动方式获得压缩分配，同时对压缩模型进行了修补，因此遭受了效率问题的困扰。与先前的艺术不同，我们在本文中提出了一种新颖的一击修剪定量化（OPQ），该文章仅使用预训练的权重参数来分析解决压缩分配。在填充过程中，压缩模块是固定的，仅重量参数更新。据我们所知，OPQ是揭示预训练模型的第一项工作，足以同时解决修剪和量化，而在填充阶段没有任何复杂的迭代/手动优化。此外，我们提出了一种统一的渠道量化方法，该方法强制执行每一层的所有通道共享一个通用的代码簿，这会导致低比率分配，而无需引入传统渠道量量化带来的额外开销。使用Alexnet/Mobilenet-V1/Resnet-50进行ImageNet的全面实验表明，我们的方法提高了准确性和训练效率，而与最先进的ART相比，获得的压缩率明显更高。

As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the key is to find suitable compression allocation (e.g., pruning sparsity and quantization codebook) of each layer. Existing solutions obtain the compression allocation in an iterative/manual fashion while finetuning the compressed model, thus suffering from the efficiency issue. Different from the prior art, we propose a novel One-shot Pruning-Quantization (OPQ) in this paper, which analytically solves the compression allocation with pre-trained weight parameters only. During finetuning, the compression module is fixed and only weight parameters are updated. To our knowledge, OPQ is the first work that reveals pre-trained model is sufficient for solving pruning and quantization simultaneously, without any complex iterative/manual optimization at the finetuning stage. Furthermore, we propose a unified channel-wise quantization method that enforces all channels of each layer to share a common codebook, which leads to low bit-rate allocation without introducing extra overhead brought by traditional channel-wise quantization. Comprehensive experiments on ImageNet with AlexNet/MobileNet-V1/ResNet-50 show that our method improves accuracy and training efficiency while obtains significantly higher compression rates compared to the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题