论文标题
深度神经网络的基于标准偏差的量化
Standard Deviation-Based Quantization for Deep Neural Networks
论文作者
论文摘要
深度神经网络的量化是一种有希望的方法,可降低推理成本,使得在资源限制的设备上运行深层网络是可行的。受现有方法的启发,我们提出了一个新框架,使用网络的权重和激活分布(即标准偏差)了解量化间隔(离散值)。此外,我们提出了一种新型的基本-2对数量化方案,以将权重量化为两个离散值。我们提出的计划使我们能够用简单的Shift-ADD操作代替渴望资源的高精度乘数。根据我们的评估,与全精度模型相比,我们的方法在CIFAR10和Imagenet数据集上的现有工作优于现有的工作,甚至可以通过3位权重和激活来实现更好的准确性性能。此外,我们的方案同时修剪网络的参数,并使我们能够在量化过程中灵活调整修剪比。
Quantization of deep neural networks is a promising approach that reduces the inference cost, making it feasible to run deep networks on resource-restricted devices. Inspired by existing methods, we propose a new framework to learn the quantization intervals (discrete values) using the knowledge of the network's weight and activation distributions, i.e., standard deviation. Furthermore, we propose a novel base-2 logarithmic quantization scheme to quantize weights to power-of-two discrete values. Our proposed scheme allows us to replace resource-hungry high-precision multipliers with simple shift-add operations. According to our evaluations, our method outperforms existing work on CIFAR10 and ImageNet datasets and even achieves better accuracy performance with 3-bit weights and activations when compared to the full-precision models. Moreover, our scheme simultaneously prunes the network's parameters and allows us to flexibly adjust the pruning ratio during the quantization process.