论文标题
在移动设备上部署的Conv1D的INT8 Winograd加速度
INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on Mobile Devices
论文作者
论文摘要
自动语音识别(ASR)模型的密集计算阻碍了他们被部署在移动设备上。在本文中,我们提出了一种新颖的量化Winograd优化管道,该管道结合了量化和快速卷积,以实现ASR模型移动设备上有效的推理加速度。为了避免由于量化和Winograd卷积的组合而导致的信息损失,提出了一种范围尺度量化(RSQ)训练方法来扩展量化的数值范围并从高精度值中提取知识。此外,改进的配备CONC1D的DFSMN(COVSDFSMN)模型是为移动部署而设计的。我们在ConvdfSMN和Wav2letter模型上进行了广泛的实验。结果表明,可以使用所提出的管道有效地优化模型。特别是,WAV2letter在基于ARMV7的移动设备上降低了大约0.07%的速度,达到1.48*加速。
The intensive computation of Automatic Speech Recognition (ASR) models obstructs them from being deployed on mobile devices. In this paper, we present a novel quantized Winograd optimization pipeline, which combines the quantization and fast convolution to achieve efficient inference acceleration on mobile devices for ASR models. To avoid the information loss due to the combination of quantization and Winograd convolution, a Range-Scaled Quantization (RSQ) training method is proposed to expand the quantized numerical range and to distill knowledge from high-precision values. Moreover, an improved Conv1D equipped DFSMN (ConvDFSMN) model is designed for mobile deployment. We conduct extensive experiments on both ConvDFSMN and Wav2letter models. Results demonstrate the models can be effectively optimized with the proposed pipeline. Especially, Wav2letter achieves 1.48* speedup with an approximate 0.07% WER decrease on ARMv7-based mobile devices.