半监督的跨语性语音情感识别

论文标题

半监督的跨语性语音情感识别

Semi-supervised cross-lingual speech emotion recognition

论文作者

Agarla, Mirko, Bianco, Simone, Celona, Luigi, Napoletano, Paolo, Petrovsky, Alexey, Piccoli, Flavio, Schettini, Raimondo, Shanin, Ivan

论文摘要

由于使用深度学习技术，在过去几年中，语音情感识别（SER）的表现在过去几年中大大提高。但是，由于两个主要因素，跨语言SER仍然是现实世界应用中的挑战：第一个是源和目标域分布之间的巨大差距；第二个因素是与新语言标签的话语相比，未标记的话语的主要供应性。考虑到以前的方面，我们提出了一种半监督的学习（SSL）方法，用于跨语性情感识别，当目标域中只有少数标记的示例（即新语言）时，我们就提出了一种。我们的方法基于变压器，它通过在未标记的话语上利用伪标记的策略来适应新领域。特别是，研究了使用硬和软伪标签方法的使用。我们在源和新语言上均独立于说话者的设置中彻底评估了所提出的方法的性能，并在属于不同语言菌株的五种语言中表现出其稳健性。实验发现表明，与最先进的方法相比，未加权的准确性平均增加了40％。

Performance in Speech Emotion Recognition (SER) on a single language has increased greatly in the last few years thanks to the use of deep learning techniques. However, cross-lingual SER remains a challenge in real-world applications due to two main factors: the first is the big gap among the source and the target domain distributions; the second factor is the major availability of unlabeled utterances in contrast to the labeled ones for the new language. Taking into account previous aspects, we propose a Semi-Supervised Learning (SSL) method for cross-lingual emotion recognition when only few labeled examples in the target domain (i.e. the new language) are available. Our method is based on a Transformer and it adapts to the new domain by exploiting a pseudo-labeling strategy on the unlabeled utterances. In particular, the use of a hard and soft pseudo-labels approach is investigated. We thoroughly evaluate the performance of the proposed method in a speaker-independent setup on both the source and the new language and show its robustness across five languages belonging to different linguistic strains. The experimental findings indicate that the unweighted accuracy is increased by an average of 40% compared to state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题