论文标题
在视觉关系检测中克服误报
Towards Overcoming False Positives in Visual Relationship Detection
论文作者
论文摘要
在本文中,我们研究了视觉关系检测(VRD)中高假阳性率的原因。我们观察到,在训练过程中,关系建议分布高度不平衡:大多数负面关系建议易于识别,例如,不准确的对象检测,这导致低频困难建议的拟合不足。本文介绍了空间意识平衡的负面提案采样(SABRA),这是一个强大的VRD框架,可减轻误报的影响。为了有效地优化分布不平衡的模型,Sabra采用平衡的负面提案采样(BNPS)策略来进行迷你批次采样。 BNPS将建议分为5个定义明确的子类,并根据反频率产生平衡的训练分布。 BNPS提供了更容易的优化景观,并大大减少了假阳性的数量。为了进一步解决具有高空间歧义的低频挑战的误报提案,我们提高了SABRA在两个方面上的空间建模能力:一个简单有效的多型多型异构图形注意网络(MH-GAT),模拟对象的全球空间相互作用,并了解对象的全球空间相互作用,并了解一种空间掩盖解码器,以了解局部构图。 SABRA在两个人类对象相互作用(HOI)数据集和一个一般VRD数据集上以很大的边距优于SOTA方法。
In this paper, we investigate the cause of the high false positive rate in Visual Relationship Detection (VRD). We observe that during training, the relationship proposal distribution is highly imbalanced: most of the negative relationship proposals are easy to identify, e.g., the inaccurate object detection, which leads to the under-fitting of low-frequency difficult proposals. This paper presents Spatially-Aware Balanced negative pRoposal sAmpling (SABRA), a robust VRD framework that alleviates the influence of false positives. To effectively optimize the model under imbalanced distribution, SABRA adopts Balanced Negative Proposal Sampling (BNPS) strategy for mini-batch sampling. BNPS divides proposals into 5 well defined sub-classes and generates a balanced training distribution according to the inverse frequency. BNPS gives an easier optimization landscape and significantly reduces the number of false positives. To further resolve the low-frequency challenging false positive proposals with high spatial ambiguity, we improve the spatial modeling ability of SABRA on two aspects: a simple and efficient multi-head heterogeneous graph attention network (MH-GAT) that models the global spatial interactions of objects, and a spatial mask decoder that learns the local spatial configuration. SABRA outperforms SOTA methods by a large margin on two human-object interaction (HOI) datasets and one general VRD dataset.