论文标题
捕获运动中的人:从单眼视频中的时间参与3D人姿势和形状估计
Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video
论文作者
论文摘要
学习捕获人类运动对于从单眼视频中的3D人姿势和形状估计至关重要。但是,现有方法主要依赖于经常性或卷积操作来建模此类时间信息,从而限制了捕获人类运动的非本地上下文关系的能力。为了解决这个问题,我们提出了一个运动姿势和形状网络(MPS-NET),以有效地捕获运动中的人,以估算视频中准确且具有时间连贯的3D人姿势和形状。具体而言,我们首先提出了一个运动连续性注意(MOCA)模块,该模块利用了从人体运动观察到的视觉提示,以适应性地重新校准了需要在序列中注意的范围,以更好地捕获运动连续性依赖性。然后,我们开发了一个分层的细心特征集成(HAFI)模块,以有效地结合相邻的过去和将来特征表示形式,以增强时间相关性并完善当前帧的特征表示。通过将MOCA和HAFI模块耦合,拟议的MPS-NET在视频中估算3D人体姿势和形状方面表现出色。尽管从概念上讲简单,但我们的MPS网络不仅胜过3DPW,MPI-INF-3DHP和Human 36M基准数据集的最先进方法,而且使用更少的网络参数。可以在https://mps-net.github.io/mps-net/上找到视频演示。
Learning to capture human motion is essential to 3D human pose and shape estimation from monocular video. However, the existing methods mainly rely on recurrent or convolutional operation to model such temporal information, which limits the ability to capture non-local context relations of human motion. To address this problem, we propose a motion pose and shape network (MPS-Net) to effectively capture humans in motion to estimate accurate and temporally coherent 3D human pose and shape from a video. Specifically, we first propose a motion continuity attention (MoCA) module that leverages visual cues observed from human motion to adaptively recalibrate the range that needs attention in the sequence to better capture the motion continuity dependencies. Then, we develop a hierarchical attentive feature integration (HAFI) module to effectively combine adjacent past and future feature representations to strengthen temporal correlation and refine the feature representation of the current frame. By coupling the MoCA and HAFI modules, the proposed MPS-Net excels in estimating 3D human pose and shape in the video. Though conceptually simple, our MPS-Net not only outperforms the state-of-the-art methods on the 3DPW, MPI-INF-3DHP, and Human3.6M benchmark datasets, but also uses fewer network parameters. The video demos can be found at https://mps-net.github.io/MPS-Net/.