CTL-MTNET：一种新颖的封闭式和转移学习的基于学习的混合任务网，用于单孔和跨科普斯语音情感识别

论文标题

CTL-MTNET：一种新颖的封闭式和转移学习的基于学习的混合任务网，用于单孔和跨科普斯语音情感识别

CTL-MTNet: A Novel CapsNet and Transfer Learning-Based Mixed Task Net for the Single-Corpus and Cross-Corpus Speech Emotion Recognition

论文作者

Wen, Xin-Cheng, Ye, Jia-Xin, Luo, Yan, Xu, Yong, Wang, Xuan-Ze, Wu, Chang-Li, Liu, Kun-Hong

论文摘要

语音情绪识别（SER）已成为人类计算机互动研究的焦点。 SER中的一个基本挑战是从不同的扬声器或语言中提取常见属性，尤其是当必须培训特定的来源语料库以识别来自另一个语音语料库的未知数据时。为了应对这一挑战，提出了基于胶囊网络（CAPSNET）和基于转移学习的混合任务网（CTLMTNET），以同时在本文中处理Singlecorpus和Cross-Corpus SER任务。对于单孔任务，通过将自我发项机制嵌入到CAPSNET中，指导模块专注于可以馈入不同胶囊的重要特征，从而设计了卷积通用和注意力帽模块CPAC）的组合。 CPAC提取的高级特征提供了足够的判别能力。此外，为了处理跨科普斯任务，CTL-MTNET通过将CPAC与余量差异减免（MDD）相结合来采用语料库适应性对抗模块（CAAM），可以通过提取强烈的情感共同点来学习域 - 不变的情绪表征。使用四个以不同语言的著名SER数据集进行了两种单次跨科普斯任务的消融研究和可视化的实验，以进行绩效评估和比较。结果表明，与许多最新方法相比，在所有情况下，CTL-MTNET在所有情况下均表现出更好的性能。源代码和补充材料可在以下网址提供：https：//github.com/mldmxm2017/ctlmtnet

Speech Emotion Recognition (SER) has become a growing focus of research in human-computer interaction. An essential challenge in SER is to extract common attributes from different speakers or languages, especially when a specific source corpus has to be trained to recognize the unknown data coming from another speech corpus. To address this challenge, a Capsule Network (CapsNet) and Transfer Learning based Mixed Task Net (CTLMTNet) are proposed to deal with both the singlecorpus and cross-corpus SER tasks simultaneously in this paper. For the single-corpus task, the combination of Convolution-Pooling and Attention CapsNet module CPAC) is designed by embedding the self-attention mechanism to the CapsNet, guiding the module to focus on the important features that can be fed into different capsules. The extracted high-level features by CPAC provide sufficient discriminative ability. Furthermore, to handle the cross-corpus task, CTL-MTNet employs a Corpus Adaptation Adversarial Module (CAAM) by combining CPAC with Margin Disparity Discrepancy (MDD), which can learn the domain-invariant emotion representations through extracting the strong emotion commonness. Experiments including ablation studies and visualizations on both singleand cross-corpus tasks using four well-known SER datasets in different languages are conducted for performance evaluation and comparison. The results indicate that in both tasks the CTL-MTNet showed better performance in all cases compared to a number of state-of-the-art methods. The source code and the supplementary materials are available at: https://github.com/MLDMXM2017/CTLMTNet

下载PDF全文

下载文献需遵守相关版权规定

论文标题