使用基于混合注意的图形卷积网络的骨骼人类动作识别

论文标题

使用基于混合注意的图形卷积网络的骨骼人类动作识别

Skeletal Human Action Recognition using Hybrid Attention based Graph Convolutional Network

论文作者

Xing, Hao, Burschka, Darius

论文摘要

在基于骨架的动作识别中，图形卷积网络将人类骨骼关节作为顶点模型，并通过邻接矩阵将其连接起来，可以将其视为局部注意力掩码。但是，在大多数现有的图形卷积网络中，局部注意力面膜是根据人类骨架关节的自然连接来定义的，而忽略了例如头部，手和脚关节之间的动态关系。此外，注意机制已被证明在自然语言处理和图像描述中有效，在现有方法中很少研究。在这项工作中，我们提出了一个新的自适应空间注意力层，该层将局部注意力图扩展到基于相对距离和相对角度信息的全局。此外，我们设计了一个连接头部，手脚的新初始图邻接矩阵，该矩阵在动作识别精度方面显示出可见的改善。在日常生活中人类活动领域的两个大规模和挑战性数据集上评估了该模型：NTU-RGB+D和动力学骨架。结果表明，我们的模型在两个数据集上都有很强的性能。

In skeleton-based action recognition, Graph Convolutional Networks model human skeletal joints as vertices and connect them through an adjacency matrix, which can be seen as a local attention mask. However, in most existing Graph Convolutional Networks, the local attention mask is defined based on natural connections of human skeleton joints and ignores the dynamic relations for example between head, hands and feet joints. In addition, the attention mechanism has been proven effective in Natural Language Processing and image description, which is rarely investigated in existing methods. In this work, we proposed a new adaptive spatial attention layer that extends local attention map to global based on relative distance and relative angle information. Moreover, we design a new initial graph adjacency matrix that connects head, hands and feet, which shows visible improvement in terms of action recognition accuracy. The proposed model is evaluated on two large-scale and challenging datasets in the field of human activities in daily life: NTU-RGB+D and Kinetics skeleton. The results demonstrate that our model has strong performance on both dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题