通过不确定性建模弱监督的时间动作定位

论文标题

通过不确定性建模弱监督的时间动作定位

Weakly-supervised Temporal Action Localization by Uncertainty Modeling

论文作者

Lee, Pilhyeon, Wang, Jinglu, Lu, Yan, Byun, Hyeran

论文摘要

弱监督的时间动作本地化旨在学习仅使用视频级标签的动作类别的时间间隔。为此，至关重要的是，将动作类的框架与背景帧分开（即，不属于任何动作类）。在本文中，我们在背景框架上介绍了一个新的视角，其中将它们建模为无限制的分布样本。然后，可以通过估计每个帧（称为不确定性）的概率来检测背景帧，但是直接学习不确定性而没有框架级标签是不可行的。为了实现弱监督环境中的不确定性学习，我们利用多个实例学习公式。此外，我们进一步引入了背景熵损失，以通过鼓励其分布（动作）概率在所有动作类别上均匀分布来更好地区分背景框架。实验结果表明，我们的不确定性建模有效地减轻了背景框架的干扰，并带来了大量的性能，而没有铃铛和哨子。我们证明，我们的模型在基准，Thumos'14和ActivityNet（1.2＆1.3）上明显优于最先进的方法（1.2＆1.3）。我们的代码可从https://github.com/pilhyeon/wtal-uncneyty-modeling获得。

Weakly-supervised temporal action localization aims to learn detecting temporal intervals of action classes with only video-level labels. To this end, it is crucial to separate frames of action classes from the background frames (i.e., frames not belonging to any action classes). In this paper, we present a new perspective on background frames where they are modeled as out-of-distribution samples regarding their inconsistency. Then, background frames can be detected by estimating the probability of each frame being out-of-distribution, known as uncertainty, but it is infeasible to directly learn uncertainty without frame-level labels. To realize the uncertainty learning in the weakly-supervised setting, we leverage the multiple instance learning formulation. Moreover, we further introduce a background entropy loss to better discriminate background frames by encouraging their in-distribution (action) probabilities to be uniformly distributed over all action classes. Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles. We demonstrate that our model significantly outperforms state-of-the-art methods on the benchmarks, THUMOS'14 and ActivityNet (1.2 & 1.3). Our code is available at https://github.com/Pilhyeon/WTAL-Uncertainty-Modeling.

下载PDF全文

下载文献需遵守相关版权规定

论文标题