论文标题
及时的关系建模,以自我安排进行动作分割
Temporal Relational Modeling with Self-Supervision for Action Segmentation
论文作者
论文摘要
视频中的时间关系建模对于人类的行动理解至关重要,例如行动识别和行动细分。尽管图形卷积网络(GCN)在许多任务上表现出了有希望的优势,但有效地在长视频序列上应用图形卷积网络仍然是一个挑战。主要原因是大量节点(即视频帧)使GCN难以捕获和建模视频中的时间关系。为了解决这个问题,在本文中,我们引入了一个有效的GCN模块,扩张的时间图推理模块(DTGRM),该模块旨在在不同时间跨越视频框架之间建模时间关系和依赖关系。特别是,我们通过构建多级扩张的时间图捕获和建模时间关系,其中节点代表视频中不同矩的帧。此外,为了提高所提出模型的时间推理能力,提出了辅助自我监督任务,以鼓励扩张的时间图推理模块在视频中查找和纠正错误的时间关系。我们的DTGRM模型在三个具有挑战性的数据集上优于最先进的动作细分模型:50Salads,Georgia Tech中心活动(GTEA)和早餐数据集。该代码可在https://github.com/redwang/dtgrm上找到。
Temporal relational modeling in video is essential for human action understanding, such as action recognition and action segmentation. Although Graph Convolution Networks (GCNs) have shown promising advantages in relation reasoning on many tasks, it is still a challenge to apply graph convolution networks on long video sequences effectively. The main reason is that large number of nodes (i.e., video frames) makes GCNs hard to capture and model temporal relations in videos. To tackle this problem, in this paper, we introduce an effective GCN module, Dilated Temporal Graph Reasoning Module (DTGRM), designed to model temporal relations and dependencies between video frames at various time spans. In particular, we capture and model temporal relations via constructing multi-level dilated temporal graphs where the nodes represent frames from different moments in video. Moreover, to enhance temporal reasoning ability of the proposed model, an auxiliary self-supervised task is proposed to encourage the dilated temporal graph reasoning module to find and correct wrong temporal relations in videos. Our DTGRM model outperforms state-of-the-art action segmentation models on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset. The code is available at https://github.com/redwang/DTGRM.