redat：通过域对抗训练和重新标记的端到端ASR的重音不变代表

论文标题

redat：通过域对抗训练和重新标记的端到端ASR的重音不变代表

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

论文作者

Hu, Hu, Yang, Xuesong, Raeesy, Zeynab, Guo, Jinxi, Keskin, Gokce, Arsikere, Harish, Rastrow, Ariya, Stolcke, Andreas, Maas, Roland

论文摘要

对于端到端的ASR，口音不匹配是一个关键问题。本文旨在通过使用域对抗训练（DAT）构建一种重音RNN-T系统来解决此问题。我们揭露了DAT背后的魔法，并首次提供了DAT学习重音不变表示的理论保证。我们还证明，在DAT中执行梯度逆转相当于最大程度地减少域输出分布之间的Jensen-Shannon差异。由等效的证据，我们引入了Redat，这是一种基于DAT的新技术，该技术使用无监督的聚类或软标签重新标记了数据。 23k小时的多重数据数据表明，DAT在天然和非本地英语口音上都取得了竞争性的结果，但在看不见的口音上相对降低了13％。在美国和英国英语的非本地口音上，我们的Redat对DAT的进一步提高了3％和8％。

Accents mismatching is a critical problem for end-to-end ASR. This paper aims to address this problem by building an accent-robust RNN-T system with domain adversarial training (DAT). We unveil the magic behind DAT and provide, for the first time, a theoretical guarantee that DAT learns accent-invariant representations. We also prove that performing the gradient reversal in DAT is equivalent to minimizing the Jensen-Shannon divergence between domain output distributions. Motivated by the proof of equivalence, we introduce reDAT, a novel technique based on DAT, which relabels data using either unsupervised clustering or soft labels. Experiments on 23K hours of multi-accent data show that DAT achieves competitive results over accent-specific baselines on both native and non-native English accents but up to 13% relative WER reduction on unseen accents; our reDAT yields further improvements over DAT by 3% and 8% relatively on non-native accents of American and British English.

下载PDF全文

下载文献需遵守相关版权规定

论文标题