CodedVtr：基于代码书的稀疏体voxel变压器，带有几何指南

论文标题

CodedVtr：基于代码书的稀疏体voxel变压器，带有几何指南

CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance

论文作者

Zhao, Tianchen, Zhang, Niansong, Ning, Xuefei, Wang, He, Yi, Li, Wang, Yu

论文摘要

在许多2D视觉任务中，变形金刚的表现优于卷积神经网络，引起了很多关注。但是，众所周知，它们存在概括性问题，并依靠大规模的训练和复杂的培训技术。应用于3D任务时，不规则的数据结构和有限的数据量表增加了变压器应用程序的难度。我们提出了CodedVtr（基于代码的Voxel Transformer），从而提高了3D稀疏体素变压器的数据效率和概括能力。一方面，我们提出了基于密码的注意力，该注意将注意空间投射到其子空间中，该子空间由可学习的代码书中的“原型”组合来表示。它使注意力学习并改善概括。另一方面，我们建议使用几何信息（几何模式，密度）来指导注意力学习。 CodeDVTR可以嵌入现有的基于稀疏的方法中，并为室内和室外3D语义细分任务带来一致的性能改进

Transformers have gained much attention by outperforming convolutional neural networks in many 2D vision tasks. However, they are known to have generalization problems and rely on massive-scale pre-training and sophisticated training techniques. When applying to 3D tasks, the irregular data structure and limited data scale add to the difficulty of transformer's application. We propose CodedVTR (Codebook-based Voxel TRansformer), which improves data efficiency and generalization ability for 3D sparse voxel transformers. On the one hand, we propose the codebook-based attention that projects an attention space into its subspace represented by the combination of "prototypes" in a learnable codebook. It regularizes attention learning and improves generalization. On the other hand, we propose geometry-aware self-attention that utilizes geometric information (geometric pattern, density) to guide attention learning. CodedVTR could be embedded into existing sparse convolution-based methods, and bring consistent performance improvements for indoor and outdoor 3D semantic segmentation tasks

下载PDF全文

下载文献需遵守相关版权规定

论文标题