复发类种子：语音情感识别的增强方法

论文标题

复发类种子：语音情感识别的增强方法

CopyPaste: An Augmentation Method for Speech Emotion Recognition

论文作者

Pappagari, Raghavendra, Villalba, Jesús, Żelasko, Piotr, Moro-Velazquez, Laureano, Dehak, Najim

论文摘要

数据增强是一种用于培训强大机器学习模型的广泛使用的策略。它部分减轻了语音情感识别（SER）等任务有限的数据问题，在这些任务中，收集数据既昂贵又具有挑战性。这项研究提出了复发类种子，这是一种以感知动机的SER的新型增强程序。假设除中立以外的情绪以外的其他情感决定了演讲者在录音中的总体感知情绪，情感（情感e）和中性话语的串联仍然可以用情感来标记。我们假设在模型训练中，可以使用这些串联的话语来改善SER表现。为了验证这一点，在两个深度学习模型上测试了三个复复结果方案：一个独立训练的模型，另一种是使用从X-vector模型（一种说话者识别模型）学习的转移学习。我们观察到，所有三种复复型方案都在考虑的所有三个数据集上提高了SER性能：MSP播客，Crema-D和Iemocap。此外，复复型的性能比降噪功能更好，并且共同使用它们可以进一步改善SER性能。我们对嘈杂测试集的实验表明，即使在嘈杂的测试条件下，复复型也是有效的。

Data augmentation is a widely used strategy for training robust machine learning models. It partially alleviates the problem of limited data for tasks like speech emotion recognition (SER), where collecting data is expensive and challenging. This study proposes CopyPaste, a perceptually motivated novel augmentation procedure for SER. Assuming that the presence of emotions other than neutral dictates a speaker's overall perceived emotion in a recording, concatenation of an emotional (emotion E) and a neutral utterance can still be labeled with emotion E. We hypothesize that SER performance can be improved using these concatenated utterances in model training. To verify this, three CopyPaste schemes are tested on two deep learning models: one trained independently and another using transfer learning from an x-vector model, a speaker recognition model. We observed that all three CopyPaste schemes improve SER performance on all the three datasets considered: MSP-Podcast, Crema-D, and IEMOCAP. Additionally, CopyPaste performs better than noise augmentation and, using them together improves the SER performance further. Our experiments on noisy test sets suggested that CopyPaste is effective even in noisy test conditions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题