Meshmae：3D网格数据分析的蒙版自动编码器

论文标题

Meshmae：3D网格数据分析的蒙版自动编码器

MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis

论文作者

Liang, Yaqian, Zhao, Shanshan, Yu, Baosheng, Zhang, Jing, He, Fazhi

论文摘要

最近，自我监督的预训练在W.R.T.的各种任务上具有先进的视觉变压器。不同的数据模式，例如图像和3D点云数据。在本文中，我们探讨了基于变压器的3D网格数据分析的这种学习范式。由于将变压器体系结构应用于新模式通常是非平凡的，因此我们首先将视觉变压器调整为3D网格数据处理，即网格变压器。具体而言，我们将网格分为几个非重叠的本地贴片，每个贴片包含相同数量的面部，并使用每个贴片中心点的3D位置形成位置嵌入。受MAE的启发，我们探讨了如何使用基于变压器的结构对3D网格数据进行预训练如何使下游3D网格分析任务受益。我们首先随机掩盖网格的一些补丁，并将损坏的网格馈入网格变形金刚。然后，通过重建蒙版补丁的信息，该网络能够学习网格数据的区分表示。因此，我们将方法命名为Meshmae，可以在网格分析任务（即分类和分割）上产生最先进或可比性的性能。此外，我们还进行了全面的消融研究，以显示我们方法中关键设计的有效性。

Recently, self-supervised pre-training has advanced Vision Transformers on various tasks w.r.t. different data modalities, e.g., image and 3D point cloud data. In this paper, we explore this learning paradigm for 3D mesh data analysis based on Transformers. Since applying Transformer architectures to new modalities is usually non-trivial, we first adapt Vision Transformer to 3D mesh data processing, i.e., Mesh Transformer. In specific, we divide a mesh into several non-overlapping local patches with each containing the same number of faces and use the 3D position of each patch's center point to form positional embeddings. Inspired by MAE, we explore how pre-training on 3D mesh data with the Transformer-based structure benefits downstream 3D mesh analysis tasks. We first randomly mask some patches of the mesh and feed the corrupted mesh into Mesh Transformers. Then, through reconstructing the information of masked patches, the network is capable of learning discriminative representations for mesh data. Therefore, we name our method MeshMAE, which can yield state-of-the-art or comparable performance on mesh analysis tasks, i.e., classification and segmentation. In addition, we also conduct comprehensive ablation studies to show the effectiveness of key designs in our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题