流媒体对非自动进取的审议

论文标题

流媒体对非自动进取的审议

Streaming Align-Refine for Non-autoregressive Deliberation

论文作者

Wang, Weiran, Hu, Ke, Sainath, Tara N.

论文摘要

我们提出了一种流媒体非自动回传（非AR）解码算法，以审核流式RNN-T模型的假设比对。我们的算法促进了一个简单的贪婪解码程序，同时能够在每个框架上以有限的正确上下文产生解码结果，从而享有高效率和低延迟。这些优势是通过将离线对齐式refine算法转换为流媒体兼容的，具有新型的变压器解码器体系结构，该体系结构可为文本和音频执行局部自我构图，并在每一层中进行时间对齐的交叉注意。此外，我们使用最小单词错误率（MWER）标准对模型进行了判别培训，该标准尚未在非AR解码文献中进行。语音搜索数据集和LibrisPeech上的实验表明，借助合理的正确上下文，我们的流式模型以及离线培训的性能以及判别培训会在第一届通道模型的容量较小时会进一步增益。

We propose a streaming non-autoregressive (non-AR) decoding algorithm to deliberate the hypothesis alignment of a streaming RNN-T model. Our algorithm facilitates a simple greedy decoding procedure, and at the same time is capable of producing the decoding result at each frame with limited right context, thus enjoying both high efficiency and low latency. These advantages are achieved by converting the offline Align-Refine algorithm to be streaming-compatible, with a novel transformer decoder architecture that performs local self-attentions for both text and audio, and a time-aligned cross-attention at each layer. Furthermore, we perform discriminative training of our model with the minimum word error rate (MWER) criterion, which has not been done in the non-AR decoding literature. Experiments on voice search datasets and Librispeech show that with reasonable right context, our streaming model performs as well as the offline counterpart, and discriminative training leads to further WER gain when the first-pass model has small capacity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题