相互监督的特征调制网络，用于遮挡的行人检测

论文标题

相互监督的特征调制网络，用于遮挡的行人检测

Mutual-Supervised Feature Modulation Network for Occluded Pedestrian Detection

论文作者

He, Ye, Zhu, Chao, Yin, Xu-Cheng

论文摘要

最先进的行人探测器在非封闭式行人方面取得了重大进展，但他们仍在沉重的遮挡下挣扎。流行的两阶段方法的最近的遮挡处理策略是在额外的可见身体注释的帮助下构建两分支的建筑。但是，这些方法仍然存在一些弱点。要么两个分支只能通过得分级融合进行独立训练，因此无法保证探测器学习足够的行人特征。或者利用注意力机制仅强调可见的身体特征。但是，严重遮障的行人的可见身体特征集中在相对较小的区域上，这很容易导致遗漏。为了解决上述问题，我们在本文中提出了一个新型的相互监督特征调制（MSFM）网络，以更好地处理封闭的行人检测。我们网络中的关键MSFM模块计算了与同一行人相对应的全身盒子和可见车身箱的相似性损失，以便全身检测器可以通过遮挡零件的上下文特征来学习更完整，更强大的行人特征。为了促进MSFM模块，我们还提出了一种新型的两分支结构，该结构由标准的全身检测分支和一个额外的可见身体分类分支组成。这两个分支以相互监督的方式进行了训练，分别具有全身注释和可见的身体注释。为了验证我们提出的方法的有效性，在两个具有挑战性的行人数据集上进行了广泛的实验：加州理工学院和城市人员，与两个数据集中的其他最先进的方法相比，我们的方法取得了卓越的性能，尤其是在重型闭塞案例中。

State-of-the-art pedestrian detectors have achieved significant progress on non-occluded pedestrians, yet they are still struggling under heavy occlusions. The recent occlusion handling strategy of popular two-stage approaches is to build a two-branch architecture with the help of additional visible body annotations. Nonetheless, these methods still have some weaknesses. Either the two branches are trained independently with only score-level fusion, which cannot guarantee the detectors to learn robust enough pedestrian features. Or the attention mechanisms are exploited to only emphasize on the visible body features. However, the visible body features of heavily occluded pedestrians are concentrated on a relatively small area, which will easily cause missing detections. To address the above issues, we propose in this paper a novel Mutual-Supervised Feature Modulation (MSFM) network, to better handle occluded pedestrian detection. The key MSFM module in our network calculates the similarity loss of full body boxes and visible body boxes corresponding to the same pedestrian so that the full-body detector could learn more complete and robust pedestrian features with the assist of contextual features from the occluding parts. To facilitate the MSFM module, we also propose a novel two-branch architecture, consisting of a standard full body detection branch and an extra visible body classification branch. These two branches are trained in a mutual-supervised way with full body annotations and visible body annotations, respectively. To verify the effectiveness of our proposed method, extensive experiments are conducted on two challenging pedestrian datasets: Caltech and CityPersons, and our approach achieves superior performance compared to other state-of-the-art methods on both datasets, especially in heavy occlusion case.

下载PDF全文

下载文献需遵守相关版权规定

论文标题