基于骨架的动作识别的空间时间图注意网络

论文标题

基于骨架的动作识别的空间时间图注意网络

Spatial Temporal Graph Attention Network for Skeleton-Based Action Recognition

论文作者

Hu, Lianyu, Liu, Shenglan, Feng, Wei

论文摘要

基于骨架的动作识别中的当前方法通常是将长期时间依赖性视为骨架序列通常很长的（> 128帧），这对于以前的方法构成了一个具有挑战性的问题。在这种情况下，短期依赖性很少被正式考虑，这对于分类类似的动作至关重要。大多数当前的方法包括相互交织的仅空间模块和仅时间的模块，在这些模块中，在相邻框架中的关节之间的直接信息流受到阻碍，因此不如捕获短期运动并区分相似的作用对。为了应对这一限制，我们提出了一个作为stgat创造的一般框架，以模拟跨天空信息流。它使仅空间模块与区域感知的时空建模相称。尽管STGAT对于时空建模在理论上是有效的，但我们提出了三个简单的模块，以减少局部时空特征的冗余，并进一步释放Stgat的潜力，（1）（1）自我关注机制的范围，（2）沿时间尺寸动态重量关节动态的范围，以及（3）单独的子亚线运动，以及（3）单独的子系数。作为一个可靠的特征提取器，STGAT在对以前的方法进行分类时，在定性和定量结果中见证了类似的方法。 STGAT在三个大规模数据集上实现了最先进的性能：NTU RGB+D 60，NTU RGB+D 120和动力学骨架400。

It's common for current methods in skeleton-based action recognition to mainly consider capturing long-term temporal dependencies as skeleton sequences are typically long (>128 frames), which forms a challenging problem for previous approaches. In such conditions, short-term dependencies are few formally considered, which are critical for classifying similar actions. Most current approaches are consisted of interleaving spatial-only modules and temporal-only modules, where direct information flow among joints in adjacent frames are hindered, thus inferior to capture short-term motion and distinguish similar action pairs. To handle this limitation, we propose a general framework, coined as STGAT, to model cross-spacetime information flow. It equips the spatial-only modules with spatial-temporal modeling for regional perception. While STGAT is theoretically effective for spatial-temporal modeling, we propose three simple modules to reduce local spatial-temporal feature redundancy and further release the potential of STGAT, which (1) narrow the scope of self-attention mechanism, (2) dynamically weight joints along temporal dimension, and (3) separate subtle motion from static features, respectively. As a robust feature extractor, STGAT generalizes better upon classifying similar actions than previous methods, witnessed by both qualitative and quantitative results. STGAT achieves state-of-the-art performance on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400. Code is released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题