论文标题
嗨,起亚:唤醒单词的语音情感识别数据集
Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words
论文作者
论文摘要
唤醒单词(WUW)是一个简短的句子,用于激活语音识别系统以接收用户的语音输入。 WUW话语不仅包含用于唤醒系统的词汇信息,还包括诸如扬声器身份或情感之类的非时光信息。特别是,认识到用户的情绪状态可能会详细说明语音交流。但是,很少有数据集标记了WUW话语的情绪状态。在本文中,我们介绍了HI,KIA,这是一个新的WUW数据集,由488个韩国口音情绪发言组成,这些情感发言人从四位男性和四位女性扬声器中收集,每种话语都标有四个情绪状态,包括愤怒,快乐,悲伤或中立。我们介绍了构建数据集的逐步过程,涵盖了标签协议的方案选择,后处理和人类验证。此外,我们还使用数据集提供了两个用于WUW语音情感识别的模型。一个基于传统的手工艺特征,另一个是使用预训练的神经网络的转移学习方法。这些分类模型可以用作进一步研究的基准。
Wake-up words (WUW) is a short sentence used to activate a speech recognition system to receive the user's speech input. WUW utterances include not only the lexical information for waking up the system but also non-lexical information such as speaker identity or emotion. In particular, recognizing the user's emotional state may elaborate the voice communication. However, there is few dataset where the emotional state of the WUW utterances is labeled. In this paper, we introduce Hi, KIA, a new WUW dataset which consists of 488 Korean accent emotional utterances collected from four male and four female speakers and each of utterances is labeled with four emotional states including anger, happy, sad, or neutral. We present the step-by-step procedure to build the dataset, covering scenario selection, post-processing, and human validation for label agreement. Also, we provide two classification models for WUW speech emotion recognition using the dataset. One is based on traditional hand-craft features and the other is a transfer-learning approach using a pre-trained neural network. These classification models could be used as benchmarks in further research.