论文标题

通过单模式编码和跨模式预测多模式的多模式对比度学习多模式情感分析

Multimodal Contrastive Learning via Uni-Modal Coding and Cross-Modal Prediction for Multimodal Sentiment Analysis

论文作者

Lin, Ronghao, Hu, Haifeng

论文摘要

多模式表示学习是一项具有挑战性的任务,在该任务中,以前的工作主要集中于单模式预训练或跨模式融合。实际上,我们将建模多模式表示为构建摩天大楼,在摩天大楼中,奠定稳定的基础和设计主结构同样至关重要。前者就像编码强大的单模式表示形式一样,而后者就像在不同模态之间整合交互式信息,这对于学习有效的多模式表示至关重要。最近,对比度学习已成功地应用于表示学习中,可以用作摩天大楼的支柱,并使模型有益于提取多模式数据中包含的最重要特征。在本文中,我们提出了一个名为多模式对比度学习(MMCL)的新型框架,用于多模式表示,以同时捕获内部和模式间动力学。具体而言,我们使用有效的单模式功能增强策略来设计单模式对比度编码,以滤除声学和视觉模态中包含的固有噪声,并获得更强大的Uni-Modosity表示。此外,提出了一个伪造网络,以预测跨不同模式的表示形式,从而成功捕获了跨模式动力学。此外,我们设计了两个基于实例和情感的对比学习的对比学习任务,以促进预测过程并学习与情感有关的更多互动信息。在两个公共数据集上进行的广泛实验表明,我们的方法超过了最新方法。

Multimodal representation learning is a challenging task in which previous work mostly focus on either uni-modality pre-training or cross-modality fusion. In fact, we regard modeling multimodal representation as building a skyscraper, where laying stable foundation and designing the main structure are equally essential. The former is like encoding robust uni-modal representation while the later is like integrating interactive information among different modalities, both of which are critical to learning an effective multimodal representation. Recently, contrastive learning has been successfully applied in representation learning, which can be utilized as the pillar of the skyscraper and benefit the model to extract the most important features contained in the multimodal data. In this paper, we propose a novel framework named MultiModal Contrastive Learning (MMCL) for multimodal representation to capture intra- and inter-modality dynamics simultaneously. Specifically, we devise uni-modal contrastive coding with an efficient uni-modal feature augmentation strategy to filter inherent noise contained in acoustic and visual modality and acquire more robust uni-modality representations. Besides, a pseudo siamese network is presented to predict representation across different modalities, which successfully captures cross-modal dynamics. Moreover, we design two contrastive learning tasks, instance- and sentiment-based contrastive learning, to promote the process of prediction and learn more interactive information related to sentiment. Extensive experiments conducted on two public datasets demonstrate that our method surpasses the state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源