部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

3D Point Cloud Pre-training with Knowledge Distillation from 2D Images

论文作者

Yao, Yuan, Zhang, Yuanhan, Yin, Zhenfei, Luo, Jiebo, Ouyang, Wanli, Huang, Xiaoshui

论文摘要

预先训练的2D视觉模型的最新成功主要归因于从大规模数据集中学习。但是，与2D图像数据集相比，3D点云的当前预训练数据受到限制。为了克服这一限制，我们为3D点云预训练的模型提出了一种知识蒸馏方法，以直接从2D表示学习模型（尤其是剪辑的图像编码器）中获取知识，这是通过概念对齐的。具体而言，我们引入了一种跨注意机制，以从3D点云中提取概念特征，并将它们与2D图像中的语义信息进行比较。在此方案中，点云预训练的模型直接从2D教师模型中包含的丰富信息中学习。广泛的实验表明，所提出的知识蒸馏方案比最先进的3D预训练方法具有更高的准确性，用于下游任务上的合成和现实世界数据集，包括对象分类，对象检测，语义段和部分分段。

The recent success of pre-trained 2D vision models is mostly attributable to learning from large-scale datasets. However, compared with 2D image datasets, the current pre-training data of 3D point cloud is limited. To overcome this limitation, we propose a knowledge distillation method for 3D point cloud pre-trained models to acquire knowledge directly from the 2D representation learning model, particularly the image encoder of CLIP, through concept alignment. Specifically, we introduce a cross-attention mechanism to extract concept features from 3D point cloud and compare them with the semantic information from 2D images. In this scheme, the point cloud pre-trained models learn directly from rich information contained in 2D teacher models. Extensive experiments demonstrate that the proposed knowledge distillation scheme achieves higher accuracy than the state-of-the-art 3D pre-training methods for synthetic and real-world datasets on downstream tasks, including object classification, object detection, semantic segmentation, and part segmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题