论文标题
端到端的半监督学习视频动作检测
End-to-End Semi-Supervised Learning for Video Action Detection
论文作者
论文摘要
在这项工作中,我们专注于半监督的学习视频动作检测,该学习既利用标签和未标记的数据。我们提出了一种简单的基于端到端一致性的方法,该方法有效地利用了未标记的数据。视频动作检测需要,动作类预测以及动作的时空定位。因此,我们研究了两种类型的约束,分类一致性和时空的一致性。视频中主要背景和静态区域的存在使得利用时空的一致性进行动作检测变得具有挑战性。为了解决这个问题,我们提出了两个新型的正规化约束,以实现时空的一致性。 1)时间相干性和2)梯度平滑度。这两个方面都利用视频中的动作的时间连续性,并且被发现有效利用未标记的视频进行动作检测。我们证明了所提出的方法对两个不同的动作检测基准数据集的有效性,即UCF101-24和JHMDB-21。此外,我们还展示了YouTube-VOS上所提出的视频对象分割方法的有效性,该方法证明了其概括能力,与最近完全监督的方法相比,提出的方法仅在UCF101-24上使用20%的注释来实现竞争性能。在UCF101-24上,与监督方法相比,它分别在0.5 f-MAP和V-MAP时分别提高了 +8.9%和 +11%。
In this work, we focus on semi-supervised learning for video action detection which utilizes both labeled as well as unlabeled data. We propose a simple end-to-end consistency based approach which effectively utilizes the unlabeled data. Video action detection requires both, action class prediction as well as a spatio-temporal localization of actions. Therefore, we investigate two types of constraints, classification consistency, and spatio-temporal consistency. The presence of predominant background and static regions in a video makes it challenging to utilize spatio-temporal consistency for action detection. To address this, we propose two novel regularization constraints for spatio-temporal consistency; 1) temporal coherency, and 2) gradient smoothness. Both these aspects exploit the temporal continuity of action in videos and are found to be effective for utilizing unlabeled videos for action detection. We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets, UCF101-24 and JHMDB-21. In addition, we also show the effectiveness of the proposed approach for video object segmentation on the Youtube-VOS which demonstrates its generalization capability The proposed approach achieves competitive performance by using merely 20% of annotations on UCF101-24 when compared with recent fully supervised methods. On UCF101-24, it improves the score by +8.9% and +11% at 0.5 f-mAP and v-mAP respectively, compared to supervised approach.