PointVST：通过特定于观点的点对像翻译进行3D点云的自我监管的预训练

论文标题

PointVST：通过特定于观点的点对像翻译进行3D点云的自我监管的预训练

PointVST: Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation

论文作者

Zhang, Qijian, Hou, Junhui

论文摘要

在过去的几年中，在语言和2D Vision社区中学习了自我监督的代表性学习的巨大成功和流行。但是，此类进步尚未完全迁移到3D点云学习的领域。与为深点云提取器设计的现有预训练范式不同，这些范例属于生成建模或对比度学习的范围，本文提出了一个翻译性培训框架，即PointVST，即由新型的自我避免的新型自我审议的借口驱动，这是由3D点云从3D点云到其相应的多样化的2D构成图像的跨模式翻译的借口。更具体地说，我们首先通过插入视点指示灯来推论视图条件的嵌入，然后自适应地汇总特定于视图的全局代码字，可以进一步将其进一步为随后的2D卷积翻译头，以生成图像生成。对各种下游任务情景的广泛实验评估表明，我们的PointVST比当前最新方法以及令人满意的域传输能力表现出一致，突出的性能优势。我们的代码将在https://github.com/keeganhk/pointvst上公开获取。

The past few years have witnessed the great success and prevalence of self-supervised representation learning within the language and 2D vision communities. However, such advancements have not been fully migrated to the field of 3D point cloud learning. Different from existing pre-training paradigms designed for deep point cloud feature extractors that fall into the scope of generative modeling or contrastive learning, this paper proposes a translative pre-training framework, namely PointVST, driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images. More specifically, we begin with deducing view-conditioned point-wise embeddings through the insertion of the viewpoint indicator, and then adaptively aggregate a view-specific global codeword, which can be further fed into subsequent 2D convolutional translation heads for image generation. Extensive experimental evaluations on various downstream task scenarios demonstrate that our PointVST shows consistent and prominent performance superiority over current state-of-the-art approaches as well as satisfactory domain transfer capability. Our code will be publicly available at https://github.com/keeganhk/PointVST.

下载PDF全文

下载文献需遵守相关版权规定

论文标题