论文标题
视力语言预训练用于多模式基于方面的情感分析
Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis
论文作者
论文摘要
作为情感分析的重要任务,近年来,基于多模式的情感分析(MABSA)引起了越来越多的关注。但是,以前的方法(i)使用单独训练的视觉和文本模型,它们忽略了交叉模式的对准,或(ii)使用通过一般训练的一般预训练任务进行预训练的视觉语言模型,这些模型不足以识别良好的方面,观点以及它们跨模态的一致性。为了应对这些限制,我们建议针对MABSA(VLPMABSA)的特定任务视觉语言预训练框架,该框架是所有预处理和下游任务的统一多模式编码器架构。我们进一步从语言,视觉和多模式模式中设计了三种类型的特定任务预训练任务。实验结果表明,我们的方法通常优于三个MABSA子任务的最先进方法。进一步的分析证明了每个预审进任务的有效性。源代码将在https://github.com/nustm/vlp-mabsa上公开发布。
As an important task in sentiment analysis, Multimodal Aspect-Based Sentiment Analysis (MABSA) has attracted increasing attention in recent years. However, previous approaches either (i) use separately pre-trained visual and textual models, which ignore the crossmodal alignment or (ii) use vision-language models pre-trained with general pre-training tasks, which are inadequate to identify finegrained aspects, opinions, and their alignments across modalities. To tackle these limitations, we propose a task-specific Vision-Language Pre-training framework for MABSA (VLPMABSA), which is a unified multimodal encoder-decoder architecture for all the pretraining and downstream tasks. We further design three types of task-specific pre-training tasks from the language, vision, and multimodal modalities, respectively. Experimental results show that our approach generally outperforms the state-of-the-art approaches on three MABSA subtasks. Further analysis demonstrates the effectiveness of each pretraining task. The source code is publicly released at https://github.com/NUSTM/VLP-MABSA.