论文标题
多模式意识形态预测和分析的三胞胎边缘目标的晚期融合
Late Fusion with Triplet Margin Objective for Multimodal Ideology Prediction and Analysis
论文作者
论文摘要
意识形态预测的先前工作主要集中在单一模式上,即文本或图像。在这项工作中,我们介绍了多模式意识形态预测的任务,其中一个模型可以预测二进制或五点量表意识形态倾向,鉴于具有政治内容的文本图像对。我们首先收集了五个新的大型数据集,其中包括英文文档和图像以及他们的意识形态倾向,涵盖了来自Reddit和Twitter的广泛主流媒体和社交媒体帖子的新闻文章。我们对新闻文章进行了深入的分析,并揭示了整个政治范围内图像内容和使用的差异。此外,我们进行了广泛的实验和消融研究,证明了针对不同模型组件的目标预处理目标的有效性。我们表现最佳的模型,一种概述了多模式内容的三胞胎目标的后期融合架构,优于最先进的文本模型几乎将几乎4%,而强大的多模式基线却没有预计超过3%。
Prior work on ideology prediction has largely focused on single modalities, i.e., text or images. In this work, we introduce the task of multimodal ideology prediction, where a model predicts binary or five-point scale ideological leanings, given a text-image pair with political content. We first collect five new large-scale datasets with English documents and images along with their ideological leanings, covering news articles from a wide range of US mainstream media and social media posts from Reddit and Twitter. We conduct in-depth analyses of news articles and reveal differences in image content and usage across the political spectrum. Furthermore, we perform extensive experiments and ablation studies, demonstrating the effectiveness of targeted pretraining objectives on different model components. Our best-performing model, a late-fusion architecture pretrained with a triplet objective over multimodal content, outperforms the state-of-the-art text-only model by almost 4% and a strong multimodal baseline with no pretraining by over 3%.