使用FHVAE和对抗性训练的语言理解的弱监督构音不变特征

论文标题

使用FHVAE和对抗性训练的语言理解的弱监督构音不变特征

Weak-Supervised Dysarthria-invariant Features for Spoken Language Understanding using an FHVAE and Adversarial Training

论文作者

Qi, Jinzi, Van hamme, Hugo

论文摘要

训练数据的稀缺性和违反语音差异的较大说话者的变化导致准确性差，并且对语言语言语言的口语理解系统的说话者的概括不佳。通过对语音功能的工作，我们专注于使用有限的质心数据提高模型概括能力。训练有素的分解分层分层自动编码器（FHVAE）在解开内容和说话者表示方面表现出了他们的优势。较早的工作表明，构音障碍在两个特征矢量中都显示出来。在这里，我们添加对抗性训练，以弥合控制和违反语音数据域之间的差距。我们使用弱监督提取违反障碍和扬声器不变功能。与基本FHVAE模型或普通滤纸的功能相比，在口语理解任务上评估了提取的功能，并具有更严重的构音障碍的看不见的扬声器的精度。

The scarcity of training data and the large speaker variation in dysarthric speech lead to poor accuracy and poor speaker generalization of spoken language understanding systems for dysarthric speech. Through work on the speech features, we focus on improving the model generalization ability with limited dysarthric data. Factorized Hierarchical Variational Auto-Encoders (FHVAE) trained unsupervisedly have shown their advantage in disentangling content and speaker representations. Earlier work showed that the dysarthria shows in both feature vectors. Here, we add adversarial training to bridge the gap between the control and dysarthric speech data domains. We extract dysarthric and speaker invariant features using weak supervision. The extracted features are evaluated on a Spoken Language Understanding task and yield a higher accuracy on unseen speakers with more severe dysarthria compared to features from the basic FHVAE model or plain filterbanks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题