adafuse：自适应多维融合，用于野外精确的人姿势估计

论文标题

adafuse：自适应多维融合，用于野外精确的人姿势估计

AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild

论文作者

Zhang, Zhe, Wang, Chunyu, Qiu, Weichao, Qin, Wenhu, Zeng, Wenjun

论文摘要

闭塞可能是野外姿势估计的最大挑战。典型的溶液通常依靠侵入性传感器（例如IMU）检测封闭的关节。为了使任务真正不受限制，我们提出了一种自适应的多视融合方法Afafuse，它可以通过利用可见视图来增强遮挡视图中的功能。 Afafuse的核心是通过探索热图表示的稀疏性来确定两个视图之间的点对应关系。我们还学习了每种相机视图的自适应融合重量，以反映其功能质量，以减少``不良''视图不受欢迎损坏的好功能的机会。 Fusion模型是通过姿势估计网络端到端训练的，可以直接应用于新的相机配置而无需其他适应。我们广泛评估了三个公共数据集的方法，包括人为360万，全面捕获和CMU Panoptic。它的表现优于所有这些的最先进。我们还创建了一个大型合成数据集遮挡人，这使我们能够在闭合接头上执行数值评估，因为它为图像中的每个关节提供了遮挡标签。数据集和代码在https://github.com/zhezh/adafuse-3d-human-pose上发布。

Occlusion is probably the biggest challenge for human pose estimation in the wild. Typical solutions often rely on intrusive sensors such as IMUs to detect occluded joints. To make the task truly unconstrained, we present AdaFuse, an adaptive multiview fusion method, which can enhance the features in occluded views by leveraging those in visible views. The core of AdaFuse is to determine the point-point correspondence between two views which we solve effectively by exploring the sparsity of the heatmap representation. We also learn an adaptive fusion weight for each camera view to reflect its feature quality in order to reduce the chance that good features are undesirably corrupted by ``bad'' views. The fusion model is trained end-to-end with the pose estimation network, and can be directly applied to new camera configurations without additional adaptation. We extensively evaluate the approach on three public datasets including Human3.6M, Total Capture and CMU Panoptic. It outperforms the state-of-the-arts on all of them. We also create a large scale synthetic dataset Occlusion-Person, which allows us to perform numerical evaluation on the occluded joints, as it provides occlusion labels for every joint in the images. The dataset and code are released at https://github.com/zhezh/adafuse-3d-human-pose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题