极度稀疏神经网络的联合修剪和量化

论文标题

极度稀疏神经网络的联合修剪和量化

Joint Pruning & Quantization for Extremely Sparse Neural Networks

论文作者

Yu, Po-Hsiang, Wu, Sih-Sian, Klopp, Jan P., Chen, Liang-Gee, Chien, Shao-Yi

论文摘要

我们研究深神经网络的修剪和量化。我们的目标是实现量化网络极高的稀疏性，以实现低成本和低功率加速器硬件的实现。在实际情况下，有很多用于密集预测任务的应用程序，因此我们选择立体声深度估计作为目标。我们提出了一个两阶段的修剪和量化管道，并在新的微调模式下引入了泰勒分数，以实现极端稀疏而不牺牲性能。我们的评估不仅表明应共同研究修剪和量化，而且还表明，将近99％的记忆需求可以削减，而硬件成本可以降低到99.9％。此外，为了与其他作品进行比较，我们证明，当应用于Cifar10和Imagenet上的Resnet时，仅修剪阶段就会击败最先进的阶段。

We investigate pruning and quantization for deep neural networks. Our goal is to achieve extremely high sparsity for quantized networks to enable implementation on low cost and low power accelerator hardware. In a practical scenario, there are particularly many applications for dense prediction tasks, hence we choose stereo depth estimation as target. We propose a two stage pruning and quantization pipeline and introduce a Taylor Score alongside a new fine-tuning mode to achieve extreme sparsity without sacrificing performance. Our evaluation does not only show that pruning and quantization should be investigated jointly, but also shows that almost 99% of memory demand can be cut while hardware costs can be reduced up to 99.9%. In addition, to compare with other works, we demonstrate that our pruning stage alone beats the state-of-the-art when applied to ResNet on CIFAR10 and ImageNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题