在美学上相关的图像字幕

论文标题

在美学上相关的图像字幕

Aesthetically Relevant Image Captioning

论文作者

Zhong, Zhipeng, Zhou, Fei, Qiu, Guoping

论文摘要

图像美学质量评估（AQA）旨在将数字美学评级分配给图像，而图像美学字幕（IAC）旨在生成图像美学方面的文本描述。在本文中，我们一起研究了图像AQA和IAC，并提出了一种新的IAC方法，称为美学相关的图像字幕（ARIC）。基于以下观察结果：图像的大多数文本评论都是关于对象及其相互作用的，而不是美学的方面，我们首先介绍了句子的美学相关性评分（ARS）的概念，并开发了一个模型来自动用ARS标记句子。然后，我们使用ARS设计ARIC模型，该模型包括ARS加权IAC丢失函数和基于ARS的不同美学标题选择器（DACS）。我们提出了广泛的实验结果，以表明ARS概念的合理性以及ARIC模型的有效性，通过证明具有较高ARS的文本可以更准确地预测美学评分，并且新的ARIC模型可以产生更准确，更加美观，更相关的相关性和更多样化的图像图像标题。此外，一个大型的新研究数据库，其中包含510k图像，具有超过500万条评论和350k的美学分数，以及用于实施ARIC的代码，请访问https://github.com/pengzai/aric。

Image aesthetic quality assessment (AQA) aims to assign numerical aesthetic ratings to images whilst image aesthetic captioning (IAC) aims to generate textual descriptions of the aesthetic aspects of images. In this paper, we study image AQA and IAC together and present a new IAC method termed Aesthetically Relevant Image Captioning (ARIC). Based on the observation that most textual comments of an image are about objects and their interactions rather than aspects of aesthetics, we first introduce the concept of Aesthetic Relevance Score (ARS) of a sentence and have developed a model to automatically label a sentence with its ARS. We then use the ARS to design the ARIC model which includes an ARS weighted IAC loss function and an ARS based diverse aesthetic caption selector (DACS). We present extensive experimental results to show the soundness of the ARS concept and the effectiveness of the ARIC model by demonstrating that texts with higher ARS's can predict the aesthetic ratings more accurately and that the new ARIC model can generate more accurate, aesthetically more relevant and more diverse image captions. Furthermore, a large new research database containing 510K images with over 5 million comments and 350K aesthetic scores, and code for implementing ARIC are available at https://github.com/PengZai/ARIC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题