通过共同应用修剪和量化自动化模型压缩

论文标题

通过共同应用修剪和量化自动化模型压缩

Automated Model Compression by Jointly Applied Pruning and Quantization

论文作者

Tang, Wenting, Wei, Xingxing, Li, Bo

论文摘要

在传统的深层压缩框架中，迭代性执行网络修剪和量化可以降低模型大小和计算成本，以满足部署要求。但是，这种修剪和量化的逐步应用可能会导致次优的解决方案和不必要的时间消耗。在本文中，我们通过将网络修剪和量化为统一的关节压缩问题，然后使用AutoML自动解决该问题来解决此问题。我们发现修剪过程可以被视为具有0位的通道量化。因此，单独的两步修剪和量化可以简化为具有混合精度的一步量化。这种统一不仅简化了压缩管道，而且还避免了压缩差异。为了实现这一想法，我们通过共同应用修剪和量化（AJPQ）提出自动模型压缩。 AJPQ使用层次结构进行设计：层控制器控制层稀疏性，并且通道控制器决定每个内核的位宽度。遵循相同的重要性标准，层控制器和通道控制器协作决定压缩策略。借助强化学习，我们的一步压缩将自动实现。与最先进的自动压缩方法相比，我们的方法在大大降低存储空间的同时获得了更好的准确性。对于固定的精确量化，AJPQ可以减小五倍以上的模型大小和两倍的计算，而在遥感对象检测中，天网的性能略有提高。当允许混合精液时，AJPQ可以减小五倍大小，而在分类任务中，MobileNet的前5位准确度只有1.06％。

In the traditional deep compression framework, iteratively performing network pruning and quantization can reduce the model size and computation cost to meet the deployment requirements. However, such a step-wise application of pruning and quantization may lead to suboptimal solutions and unnecessary time consumption. In this paper, we tackle this issue by integrating network pruning and quantization as a unified joint compression problem and then use AutoML to automatically solve it. We find the pruning process can be regarded as the channel-wise quantization with 0 bit. Thus, the separate two-step pruning and quantization can be simplified as the one-step quantization with mixed precision. This unification not only simplifies the compression pipeline but also avoids the compression divergence. To implement this idea, we propose the automated model compression by jointly applied pruning and quantization (AJPQ). AJPQ is designed with a hierarchical architecture: the layer controller controls the layer sparsity, and the channel controller decides the bit-width for each kernel. Following the same importance criterion, the layer controller and the channel controller collaboratively decide the compression strategy. With the help of reinforcement learning, our one-step compression is automatically achieved. Compared with the state-of-the-art automated compression methods, our method obtains a better accuracy while reducing the storage considerably. For fixed precision quantization, AJPQ can reduce more than five times model size and two times computation with a slight performance increase for Skynet in remote sensing object detection. When mixed-precision is allowed, AJPQ can reduce five times model size with only 1.06% top-5 accuracy decline for MobileNet in the classification task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题