论文标题
Toyota Smarthome未修剪:现实世界中未修剪的视频进行活动检测
Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity Detection
论文作者
论文摘要
设计可以在日常生活环境中成功部署的活动检测系统需要构成现实情况典型挑战的数据集。在本文中,我们介绍了一个新的未修剪日常生活数据集,该数据集具有几个现实世界中的挑战:Toyota Smarthome Untrimmed(TSU)。 TSU包含以自发方式进行的各种活动。数据集包含密集的注释,包括基本,复合活动和涉及与对象相互作用的活动。我们提供了我们数据集所需的现实挑战的分析,突出了检测算法的开放问题。我们表明,当前的最新方法无法在TSU数据集上实现令人满意的性能。因此,我们提出了一种新的基线方法,以应对数据集提供的新挑战。此方法利用一种模态(即视线流)生成注意力权重,以指导另一种模态(即RGB)以更好地检测活动边界。这对于检测以高时间差异为特征的活动特别有益。我们表明,我们建议在TSU和另一个受欢迎的挑战数据集Charades上胜过最先进的方法的方法。
Designing activity detection systems that can be successfully deployed in daily-living environments requires datasets that pose the challenges typical of real-world scenarios. In this paper, we introduce a new untrimmed daily-living dataset that features several real-world challenges: Toyota Smarthome Untrimmed (TSU). TSU contains a wide variety of activities performed in a spontaneous manner. The dataset contains dense annotations including elementary, composite activities and activities involving interactions with objects. We provide an analysis of the real-world challenges featured by our dataset, highlighting the open issues for detection algorithms. We show that current state-of-the-art methods fail to achieve satisfactory performance on the TSU dataset. Therefore, we propose a new baseline method for activity detection to tackle the novel challenges provided by our dataset. This method leverages one modality (i.e. optic flow) to generate the attention weights to guide another modality (i.e RGB) to better detect the activity boundaries. This is particularly beneficial to detect activities characterized by high temporal variance. We show that the method we propose outperforms state-of-the-art methods on TSU and on another popular challenging dataset, Charades.