交叉形式：3D人姿势估计的跨时空变压器

论文标题

交叉形式：3D人姿势估计的跨时空变压器

CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation

论文作者

Hassanin, Mohammed, Khamiss, Abdelwahed, Bennamoun, Mohammed, Boussaid, Farid, Radwan, Ibrahim

论文摘要

3D人姿势估计可以通过编码身体部位之间的几何依赖性并执行运动学约束来处理。最近，已采用变压器来编码空间和时间域中关节之间的远程依赖性。尽管他们在长期依赖性方面表现出卓越的表现，但研究指出，需要改善视觉变压器的位置。在这个方向上，我们提出了一种新颖的姿势估计变压器，其身体关节的丰富表示对于捕获框架之间的微妙变化至关重要（即功能间表示）。具体而言，通过两个新型的互动模块。该模型可以在身体关节之间明确编码互动和跨框架相互作用。拟议的架构在两个流行的3D人姿势估计数据集（Human3.6和MPI-INF-3DHP）上实现了最先进的性能。特别是，我们提出的交叉形式方法分别使用检测到的2D姿势和地面真相设置，将性能提高了0.9％和0.3％。

3D human pose estimation can be handled by encoding the geometric dependencies between the body parts and enforcing the kinematic constraints. Recently, Transformer has been adopted to encode the long-range dependencies between the joints in the spatial and temporal domains. While they had shown excellence in long-range dependencies, studies have noted the need for improving the locality of vision Transformers. In this direction, we propose a novel pose estimation Transformer featuring rich representations of body joints critical for capturing subtle changes across frames (i.e., inter-feature representation). Specifically, through two novel interaction modules; Cross-Joint Interaction and Cross-Frame Interaction, the model explicitly encodes the local and global dependencies between the body joints. The proposed architecture achieved state-of-the-art performance on two popular 3D human pose estimation datasets, Human3.6 and MPI-INF-3DHP. In particular, our proposed CrossFormer method boosts performance by 0.9% and 0.3%, compared to the closest counterpart, PoseFormer, using the detected 2D poses and ground-truth settings respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题