论文标题

仅培训整数仅深度复发性神经网络

Training Integer-Only Deep Recurrent Neural Networks

论文作者

Nia, Vahid Partovi, Sari, Eyyüb, Courville, Vanessa, Asgharian, Masoud

论文摘要

复发性神经网络(RNN)是许多文本和语音应用的骨干。这些体系结构通常由几个计算复杂的组件组成,例如:非线性激活函数,归一化,双向依赖性和注意力。为了保持良好的精度,这些组件经常使用完整的浮点计算运行,从而使它们缓慢,效率低下且难以在边缘设备上部署。此外,这些操作的复杂性质使它们更具挑战性,可以使用标准量化方法量化而没有明显的性能下降。我们提出了一种量化感知的训练方法,用于获得高度准确的仅整数复发性神经网络(IRNN)。我们的方法支持层归一化,注意力和自适应分段线性(PWL)激活函数的近似,以服务广泛的最新RNN。提出的方法使基于RNN的语言模型能够在运行时提高$ 2 \ times $的边缘设备上运行,而$ 4 \ times $减少了模型尺寸,同时保持与其全精度对应物相似的准确性。

Recurrent neural networks (RNN) are the backbone of many text and speech applications. These architectures are typically made up of several computationally complex components such as; non-linear activation functions, normalization, bi-directional dependence and attention. In order to maintain good accuracy, these components are frequently run using full-precision floating-point computation, making them slow, inefficient and difficult to deploy on edge devices. In addition, the complex nature of these operations makes them challenging to quantize using standard quantization methods without a significant performance drop. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear (PWL) approximation of activation functions, to serve a wide range of state-of-the-art RNNs. The proposed method enables RNN-based language models to run on edge devices with $2\times$ improvement in runtime, and $4\times$ reduction in model size while maintaining similar accuracy as its full-precision counterpart.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源