在视觉关系检测中克服误报

论文标题

在视觉关系检测中克服误报

Towards Overcoming False Positives in Visual Relationship Detection

论文作者

Jin, Daisheng, Ma, Xiao, Zhang, Chongzhi, Zhou, Yizhuo, Tao, Jiashu, Zhang, Mingyuan, Zhao, Haiyu, Yi, Shuai, Li, Zhoujun, Liu, Xianglong, Li, Hongsheng

论文摘要

在本文中，我们研究了视觉关系检测（VRD）中高假阳性率的原因。我们观察到，在训练过程中，关系建议分布高度不平衡：大多数负面关系建议易于识别，例如，不准确的对象检测，这导致低频困难建议的拟合不足。本文介绍了空间意识平衡的负面提案采样（SABRA），这是一个强大的VRD框架，可减轻误报的影响。为了有效地优化分布不平衡的模型，Sabra采用平衡的负面提案采样（BNPS）策略来进行迷你批次采样。 BNPS将建议分为5个定义明确的子类，并根据反频率产生平衡的训练分布。 BNPS提供了更容易的优化景观，并大大减少了假阳性的数量。为了进一步解决具有高空间歧义的低频挑战的误报提案，我们提高了SABRA在两个方面上的空间建模能力：一个简单有效的多型多型异构图形注意网络（MH-GAT），模拟对象的全球空间相互作用，并了解对象的全球空间相互作用，并了解一种空间掩盖解码器，以了解局部构图。 SABRA在两个人类对象相互作用（HOI）数据集和一个一般VRD数据集上以很大的边距优于SOTA方法。

In this paper, we investigate the cause of the high false positive rate in Visual Relationship Detection (VRD). We observe that during training, the relationship proposal distribution is highly imbalanced: most of the negative relationship proposals are easy to identify, e.g., the inaccurate object detection, which leads to the under-fitting of low-frequency difficult proposals. This paper presents Spatially-Aware Balanced negative pRoposal sAmpling (SABRA), a robust VRD framework that alleviates the influence of false positives. To effectively optimize the model under imbalanced distribution, SABRA adopts Balanced Negative Proposal Sampling (BNPS) strategy for mini-batch sampling. BNPS divides proposals into 5 well defined sub-classes and generates a balanced training distribution according to the inverse frequency. BNPS gives an easier optimization landscape and significantly reduces the number of false positives. To further resolve the low-frequency challenging false positive proposals with high spatial ambiguity, we improve the spatial modeling ability of SABRA on two aspects: a simple and efficient multi-head heterogeneous graph attention network (MH-GAT) that models the global spatial interactions of objects, and a spatial mask decoder that learns the local spatial configuration. SABRA outperforms SOTA methods by a large margin on two human-object interaction (HOI) datasets and one general VRD dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题