论文标题
抖动的背部:一种稀疏而量化的反向传播算法,用于更有效的深神经网络训练
Dithered backprop: A sparse and quantized backpropagation algorithm for more efficient deep neural network training
论文作者
论文摘要
深度神经网络是成功但计算高度昂贵的学习系统。时间和能量排水的主要来源之一是众所周知的反向传播(Backprop)算法,该算法大致占训练的计算复杂性的2/3。在这项工作中,我们提出了一种降低反向版的计算成本的方法,我们将其命名为“抖动后退”。它在于将随机量化方案应用于该方法的中间结果。特定的定量方案(称为非提取抖动(NSD))会引起稀疏性,可以通过计算有效的稀疏矩阵乘法来利用这种稀疏性。在流行图像分类任务上的实验表明,与最新方法相比,它平均诱导了多种模型或可忽略的精度下降的平均稀疏性,从而显着降低了向后通的计算复杂性。此外,我们表明我们的方法与最先进的培训方法完全兼容,这些方法将训练的位过度降低到8位,因此能够进一步降低计算要求。最后,我们讨论并展示了在分布式培训环境中应用抖动反向的潜在好处,在分布式培训环境中,沟通和计算效率都可能与参与者节点的数量同时增加。
Deep Neural Networks are successful but highly computationally expensive learning systems. One of the main sources of time and energy drains is the well known backpropagation (backprop) algorithm, which roughly accounts for 2/3 of the computational complexity of training. In this work we propose a method for reducing the computational cost of backprop, which we named dithered backprop. It consists in applying a stochastic quantization scheme to intermediate results of the method. The particular quantisation scheme, called non-subtractive dither (NSD), induces sparsity which can be exploited by computing efficient sparse matrix multiplications. Experiments on popular image classification tasks show that it induces 92% sparsity on average across a wide set of models at no or negligible accuracy drop in comparison to state-of-the-art approaches, thus significantly reducing the computational complexity of the backward pass. Moreover, we show that our method is fully compatible to state-of-the-art training methods that reduce the bit-precision of training down to 8-bits, as such being able to further reduce the computational requirements. Finally we discuss and show potential benefits of applying dithered backprop in a distributed training setting, where both communication as well as compute efficiency may increase simultaneously with the number of participant nodes.