使用预训练的声学嵌入共同预测情绪，年龄和国家

论文标题

使用预训练的声学嵌入共同预测情绪，年龄和国家

Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding

论文作者

Atmaja, Bagus Tris, Zanjabila, Sasou, Akira

论文摘要

在本文中，我们证明了使用预训练的模型提取声学嵌入以共同预测（多任务学习）三个任务的好处：情感，年龄和祖国。预先训练的模型在语音情感语料库上用WAV2VEC 2.0大型健壮模型训练。情绪和年龄任务是回归问题，而国家预测是一项分类任务。使用三个指标的单一谐波平均值来评估多任务学习的性能。分类器是一个线性网络，具有两个独立的层和共享层，包括输出层。这项研究探讨了有关不同声学特征的多任务学习（包括从情感语音数据集训练的模型中提取的声学嵌入），种子数量，批量和正常化，以预测语音中的副语言信息。

In this paper, we demonstrated the benefit of using pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single harmonic mean from three metrics was used to evaluate the performance of multitask learning. The classifier was a linear network with two independent layers and shared layers, including the output layers. This study explores multitask learning on different acoustic features (including the acoustic embedding extracted from a model trained on an affective speech dataset), seed numbers, batch sizes, and normalizations for predicting paralinguistic information from speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题