Trimtail：简单但有效的频谱级长度惩罚的低延迟流动ASR

论文标题

Trimtail：简单但有效的频谱级长度惩罚的低延迟流动ASR

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

论文作者

Song, Xingchen, Wu, Di, Wu, Zhiyong, Zhang, Binbin, Zhang, Yuekai, Peng, Zhendong, Li, Wenpeng, Pan, Fuping, Zhu, Changbao

论文摘要

在本文中，我们提出了Trimtail，这是一种简单但有效的排放正则化方法，可改善流媒体ASR模型的延迟。三尾tail的核心思想是在输入话语的频谱上直接施加长度惩罚（即，通过修剪尾随帧，请参见图1-（b）），这不需要任何对齐。我们证明，三尾tail的计算价格便宜，可以在线应用，并通过任何培训损失或任何数据集上的任何模型体系结构进行优化，而无需任何额外的努力，通过将其应用于经过CTC损失[1]或TransDucer损失的各种端到端流媒体ASR网络上，或者[2]。我们在Aishell-1和LibrisPeech上以相等甚至更好的精度达到100美元$ \ sim $ 200ms的延迟降低。此外，通过使用Trimtail，我们可以实现400ms算法的用户敏感延迟（USD）的改进，精度损失小于0.2。

In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models. The core idea of TrimTail is to apply length penalty (i.e., by trimming trailing frames, see Fig. 1-(b)) directly on the spectrogram of input utterances, which does not require any alignment. We demonstrate that TrimTail is computationally cheap and can be applied online and optimized with any training loss or any model architecture on any dataset without any extra effort by applying it on various end-to-end streaming ASR networks either trained with CTC loss [1] or Transducer loss [2]. We achieve 100 $\sim$ 200ms latency reduction with equal or even better accuracy on both Aishell-1 and Librispeech. Moreover, by using TrimTail, we can achieve a 400ms algorithmic improvement of User Sensitive Delay (USD) with an accuracy loss of less than 0.2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题