学会推断看不见的属性对象组成

论文标题

学会推断看不见的属性对象组成

Learning to Infer Unseen Attribute-Object Compositions

论文作者

Chen, Hui, Nan, Zhixiong, Jiang, Jingjing, Zheng, Nanning

论文摘要

看不见的属性对象的组成识别对于使机器学会分解和构成像人一样复杂的概念至关重要。大多数现有方法仅限于单属性对象的组成识别，并且几乎无法区分具有相似外观的组成。在本文中，提出了一个基于图的模型，该模型可以灵活地识别单个和多属性对象组成。该模型映射图像的视觉特征和属性 - 对象类别标签，该标签由单词嵌入向量表示潜在空间。然后，根据属性 - 对象语义关联的约束，在潜在空间中的视觉特征和相应的标签语义特征之间计算距离。在推断过程中，所有组合物中最接近给定图像特征的组合物用作推理结果。此外，我们构建了一个具有116,099张图像和8,030个构图类别的大规模多属性数据集（MAD）。关于MAD和其他两个单属性基准基准数据集的实验证明了我们方法的有效性。

The composition recognition of unseen attribute-object is critical to make machines learn to decompose and compose complex concepts like people. Most of the existing methods are limited to the composition recognition of single-attribute-object, and can hardly distinguish the compositions with similar appearances. In this paper, a graph-based model is proposed that can flexibly recognize both single- and multi-attribute-object compositions. The model maps the visual features of images and the attribute-object category labels represented by word embedding vectors into a latent space. Then, according to the constraints of the attribute-object semantic association, distances are calculated between visual features and the corresponding label semantic features in the latent space. During the inference, the composition that is closest to the given image feature among all compositions is used as the reasoning result. In addition, we build a large-scale Multi-Attribute Dataset (MAD) with 116,099 images and 8,030 composition categories. Experiments on MAD and two other single-attribute-object benchmark datasets demonstrate the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题