学习基于变压器的神经机器翻译的信心

论文标题

学习基于变压器的神经机器翻译的信心

Learning Confidence for Transformer-based Neural Machine Translation

论文作者

Lu, Yu, Zeng, Jiali, Zhang, Jiajun, Wu, Shuangzhi, Li, Mu

论文摘要

置信度估计旨在量化模型预测的信心，并提供成功的期望。当给出嘈杂的样本和现实世界中的嘈杂样本和分布外数据时，置信良好的置信度估计可以准确的故障预测和适当的风险测量。但是，对于神经机器翻译（NMT），这项任务仍然是一个严重的挑战，在该神经机器翻译（NMT）中，SoftMax分布的概率无法描述模型何时可能错误。为了解决这个问题，我们通过NMT模型的培训共同提出了无监督的信心估算。我们将信心解释为NMT模型需要有多少个提示来做出正确的预测，而更多的提示表明置信度较低。具体而言，NMT模型可以选择提示以微小的惩罚以提高翻译准确性。然后，我们通过计算模型使用的提示数来近似它们的信心水平。我们证明，我们的学习信心估计在广泛的句子/单词级质量估计任务上实现了高度准确性。分析结果验证了我们的置信度估计值可以在两个实际情况下正确评估潜在风险：（1）发现嘈杂的样本和（2）检测室外数据。我们进一步提出了一种基于我们学到的置信度估计的新型基于置信的实例特异性标签平滑方法，该方法的表现优于标准标签平滑。

Confidence estimation aims to quantify the confidence of the model prediction, providing an expectation of success. A well-calibrated confidence estimate enables accurate failure prediction and proper risk measurement when given noisy samples and out-of-distribution data in real-world settings. However, this task remains a severe challenge for neural machine translation (NMT), where probabilities from softmax distribution fail to describe when the model is probably mistaken. To address this problem, we propose an unsupervised confidence estimate learning jointly with the training of the NMT model. We explain confidence as how many hints the NMT model needs to make a correct prediction, and more hints indicate low confidence. Specifically, the NMT model is given the option to ask for hints to improve translation accuracy at the cost of some slight penalty. Then, we approximate their level of confidence by counting the number of hints the model uses. We demonstrate that our learned confidence estimate achieves high accuracy on extensive sentence/word-level quality estimation tasks. Analytical results verify that our confidence estimate can correctly assess underlying risk in two real-world scenarios: (1) discovering noisy samples and (2) detecting out-of-domain data. We further propose a novel confidence-based instance-specific label smoothing approach based on our learned confidence estimate, which outperforms standard label smoothing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题