论文标题
Dual-ai:群体活动识别的双路演员互动学习
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition
论文作者
论文摘要
学习多个参与者之间的时空关系对于群体活动识别至关重要。不同的小组活动经常显示视频中参与者之间的多样化互动。因此,通常很难从空间演员进化的单一观点对复杂的组活动进行建模。为了解决这个问题,我们提出了一个独特的双路演员相互作用(DUALAI)框架,该框架灵活地安排了两个互补订单中的空间和颞变压器,从而通过整合来自不同时空路径的优点来增强演员关系。此外,我们引入了一个新型的多尺度演员对比损失(MAC-loss),在双ai的两个互动路径之间。通过在框架和视频水平上的自我监督的演员一致性,MAC-Loss可以有效地区分单个参与者的表征,以减少不同参与者之间的动作混乱。因此,我们的双AI可以通过融合不同参与者的这种判别特征来增强群体活动识别。为了评估所提出的方法,我们对广泛使用的基准测试(包括排球,集体活动和NBA数据集)进行了广泛的实验。所提出的双AI在所有这些数据集上实现了最先进的性能。值得注意的是,拟议的Dual-AI和50%的培训数据的表现优于最近使用100%培训数据的最近方法。这证实了双AI对小组活动识别的概括能力,即使在有限监督的挑战性情况下也是如此。
Learning spatial-temporal relation among multiple actors is crucial for group activity recognition. Different group activities often show the diversified interactions between actors in the video. Hence, it is often difficult to model complex group activities from a single view of spatial-temporal actor evolution. To tackle this problem, we propose a distinct Dual-path Actor Interaction (DualAI) framework, which flexibly arranges spatial and temporal transformers in two complementary orders, enhancing actor relations by integrating merits from different spatiotemporal paths. Moreover, we introduce a novel Multi-scale Actor Contrastive Loss (MAC-Loss) between two interactive paths of Dual-AI. Via self-supervised actor consistency in both frame and video levels, MAC-Loss can effectively distinguish individual actor representations to reduce action confusion among different actors. Consequently, our Dual-AI can boost group activity recognition by fusing such discriminative features of different actors. To evaluate the proposed approach, we conduct extensive experiments on the widely used benchmarks, including Volleyball, Collective Activity, and NBA datasets. The proposed Dual-AI achieves state-of-the-art performance on all these datasets. It is worth noting the proposed Dual-AI with 50% training data outperforms a number of recent approaches with 100% training data. This confirms the generalization power of Dual-AI for group activity recognition, even under the challenging scenarios of limited supervision.