论文标题
部分可观测时空混沌系统的无模型预测
3D Point Cloud Pre-training with Knowledge Distillation from 2D Images
论文作者
论文摘要
预先训练的2D视觉模型的最新成功主要归因于从大规模数据集中学习。但是,与2D图像数据集相比,3D点云的当前预训练数据受到限制。为了克服这一限制,我们为3D点云预训练的模型提出了一种知识蒸馏方法,以直接从2D表示学习模型(尤其是剪辑的图像编码器)中获取知识,这是通过概念对齐的。具体而言,我们引入了一种跨注意机制,以从3D点云中提取概念特征,并将它们与2D图像中的语义信息进行比较。在此方案中,点云预训练的模型直接从2D教师模型中包含的丰富信息中学习。广泛的实验表明,所提出的知识蒸馏方案比最先进的3D预训练方法具有更高的准确性,用于下游任务上的合成和现实世界数据集,包括对象分类,对象检测,语义段和部分分段。
The recent success of pre-trained 2D vision models is mostly attributable to learning from large-scale datasets. However, compared with 2D image datasets, the current pre-training data of 3D point cloud is limited. To overcome this limitation, we propose a knowledge distillation method for 3D point cloud pre-trained models to acquire knowledge directly from the 2D representation learning model, particularly the image encoder of CLIP, through concept alignment. Specifically, we introduce a cross-attention mechanism to extract concept features from 3D point cloud and compare them with the semantic information from 2D images. In this scheme, the point cloud pre-trained models learn directly from rich information contained in 2D teacher models. Extensive experiments demonstrate that the proposed knowledge distillation scheme achieves higher accuracy than the state-of-the-art 3D pre-training methods for synthetic and real-world datasets on downstream tasks, including object classification, object detection, semantic segmentation, and part segmentation.