语音识别的基于注意的序列到序列模型的置信度估计

论文标题

语音识别的基于注意的序列到序列模型的置信度估计

Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition

论文作者

Li, Qiujia, Qiu, David, Zhang, Yu, Li, Bo, He, Yanzhang, Woodland, Philip C., Cao, Liangliang, Strohman, Trevor

论文摘要

对于各种与语音相关的任务，语音识别者的置信度得分是评估转录质量的有用度量。在传统的基于马尔可夫模型的自动语音识别（ASR）系统中，可以在解码晶格中可靠地从单词后期获得置信分数。但是，对于具有自动回归解码器的ASR系统（例如基于注意力的序列到序列模型），计算单词后期很难。一个明显的替代方法是将解码器SoftMax概率用作模型置信度。在本文中，我们首先研究了某些常用的正则化方法如何影响基于软马克斯的置信度得分并研究端到端模型的过度自信行为。然后，我们提出了一种在现有的端到端ASR模型之上的轻巧有效的方法，名为置信度估计模块（CEM）。 LibrisPeech上的实验表明，CEM可以减轻过度自信的问题，并在语言模型的浅融合中产生更可靠的置信度得分。进一步的分析表明，CEM对来自中等不匹配的领域的语音概括，并有可能改善下游任务，例如半监督学习。

For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based sequence-to-sequence model, computing word posteriors is difficult. An obvious alternative is to use the decoder softmax probability as the model confidence. In this paper, we first examine how some commonly used regularisation methods influence the softmax-based confidence scores and study the overconfident behaviour of end-to-end models. Then we propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model. Experiments on LibriSpeech show that CEM can mitigate the overconfidence problem and can produce more reliable confidence scores with and without shallow fusion of a language model. Further analysis shows that CEM generalises well to speech from a moderately mismatched domain and can potentially improve downstream tasks such as semi-supervised learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题