基于注意的双向对齐方式的组门控融合多模式情绪识别

论文标题

基于注意的双向对齐方式的组门控融合多模式情绪识别

Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition

论文作者

Liu, Pengfei, Li, Kun, Meng, Helen

论文摘要

情绪识别是一个具有挑战性且积极研究的研究领域，在情绪感知的人类计算机相互作用系统中起着至关重要的作用。在多模式的环境中，不同方式之间的时间对齐尚未得到很好的研究。本文提出了一种名为封闭式双向对准网络（GBAN）的新模型，该模型由LSTM隐藏状态上的基于注意力的双向对齐网络组成，以明确捕获语音与文本之间的对齐关系，以及一个新颖的集体门控融合（GGF）层（GGF）层以整合不同模态的表示。我们从经验上表明，注意一致的表示的表现优于LSTM的最后一个隐藏状态，而拟议的GBAN模型的表现优于Iemocap数据集中现有的最新多模式方法。

Emotion recognition is a challenging and actively-studied research area that plays a critical role in emotion-aware human-computer interaction systems. In a multimodal setting, temporal alignment between different modalities has not been well investigated yet. This paper presents a new model named as Gated Bidirectional Alignment Network (GBAN), which consists of an attention-based bidirectional alignment network over LSTM hidden states to explicitly capture the alignment relationship between speech and text, and a novel group gated fusion (GGF) layer to integrate the representations of different modalities. We empirically show that the attention-aligned representations outperform the last-hidden-states of LSTM significantly, and the proposed GBAN model outperforms existing state-of-the-art multimodal approaches on the IEMOCAP dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题