论文标题
CodedVtr:基于代码书的稀疏体voxel变压器,带有几何指南
CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance
论文作者
论文摘要
在许多2D视觉任务中,变形金刚的表现优于卷积神经网络,引起了很多关注。但是,众所周知,它们存在概括性问题,并依靠大规模的训练和复杂的培训技术。应用于3D任务时,不规则的数据结构和有限的数据量表增加了变压器应用程序的难度。我们提出了CodedVtr(基于代码的Voxel Transformer),从而提高了3D稀疏体素变压器的数据效率和概括能力。一方面,我们提出了基于密码的注意力,该注意将注意空间投射到其子空间中,该子空间由可学习的代码书中的“原型”组合来表示。它使注意力学习并改善概括。另一方面,我们建议使用几何信息(几何模式,密度)来指导注意力学习。 CodeDVTR可以嵌入现有的基于稀疏的方法中,并为室内和室外3D语义细分任务带来一致的性能改进
Transformers have gained much attention by outperforming convolutional neural networks in many 2D vision tasks. However, they are known to have generalization problems and rely on massive-scale pre-training and sophisticated training techniques. When applying to 3D tasks, the irregular data structure and limited data scale add to the difficulty of transformer's application. We propose CodedVTR (Codebook-based Voxel TRansformer), which improves data efficiency and generalization ability for 3D sparse voxel transformers. On the one hand, we propose the codebook-based attention that projects an attention space into its subspace represented by the combination of "prototypes" in a learnable codebook. It regularizes attention learning and improves generalization. On the other hand, we propose geometry-aware self-attention that utilizes geometric information (geometric pattern, density) to guide attention learning. CodedVTR could be embedded into existing sparse convolution-based methods, and bring consistent performance improvements for indoor and outdoor 3D semantic segmentation tasks