视力语言预训练用于多模式基于方面的情感分析

论文标题

视力语言预训练用于多模式基于方面的情感分析

Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

论文作者

Ling, Yan, Yu, Jianfei, Xia, Rui

论文摘要

作为情感分析的重要任务，近年来，基于多模式的情感分析（MABSA）引起了越来越多的关注。但是，以前的方法（i）使用单独训练的视觉和文本模型，它们忽略了交叉模式的对准，或（ii）使用通过一般训练的一般预训练任务进行预训练的视觉语言模型，这些模型不足以识别良好的方面，观点以及它们跨模态的一致性。为了应对这些限制，我们建议针对MABSA（VLPMABSA）的特定任务视觉语言预训练框架，该框架是所有预处理和下游任务的统一多模式编码器架构。我们进一步从语言，视觉和多模式模式中设计了三种类型的特定任务预训练任务。实验结果表明，我们的方法通常优于三个MABSA子任务的最先进方法。进一步的分析证明了每个预审进任务的有效性。源代码将在https://github.com/nustm/vlp-mabsa上公开发布。

As an important task in sentiment analysis, Multimodal Aspect-Based Sentiment Analysis (MABSA) has attracted increasing attention in recent years. However, previous approaches either (i) use separately pre-trained visual and textual models, which ignore the crossmodal alignment or (ii) use vision-language models pre-trained with general pre-training tasks, which are inadequate to identify finegrained aspects, opinions, and their alignments across modalities. To tackle these limitations, we propose a task-specific Vision-Language Pre-training framework for MABSA (VLPMABSA), which is a unified multimodal encoder-decoder architecture for all the pretraining and downstream tasks. We further design three types of task-specific pre-training tasks from the language, vision, and multimodal modalities, respectively. Experimental results show that our approach generally outperforms the state-of-the-art approaches on three MABSA subtasks. Further analysis demonstrates the effectiveness of each pretraining task. The source code is publicly released at https://github.com/NUSTM/VLP-MABSA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题