语言辅助3D功能学习用于语义场景理解

论文标题

语言辅助3D功能学习用于语义场景理解

Language-Assisted 3D Feature Learning for Semantic Scene Understanding

论文作者

Zhang, Junbo, Fan, Guofan, Wang, Guanghan, Su, Zhengyuan, Ma, Kaisheng, Yi, Li

论文摘要

学习描述性3D功能对于了解具有不同对象和复杂结构的3D场景至关重要。但是，通常未知重要的几何属性和场景上下文是否在端到端训练有素的3D场景理解网络中获得足够的重点。为了指导3D特征学习到重要的几何属性和场景上下文，我们探讨了文本场景描述的帮助。给定一些自由形式的描述与3D场景配对，我们提取有关对象关系和对象属性的知识。然后，我们通过三个基于分类的辅助任务将知识注入3D功能学习。该语言辅助培训可以与现代对象检测和实例分割方法相结合，以促进3D语义场景的理解，尤其是在标签缺陷方面。此外，使用语言帮助学到的3D功能可以更好地与语言功能保持一致，这可以使各种3D语言多模式任务受益。对仅3D和3D语言任务的几个基准测试的实验证明了我们语言辅助3D功能学习的有效性。代码可从https://github.com/asterisci/language-assisted-3d获得。

Learning descriptive 3D features is crucial for understanding 3D scenes with diverse objects and complex structures. However, it is usually unknown whether important geometric attributes and scene context obtain enough emphasis in an end-to-end trained 3D scene understanding network. To guide 3D feature learning toward important geometric attributes and scene context, we explore the help of textual scene descriptions. Given some free-form descriptions paired with 3D scenes, we extract the knowledge regarding the object relationships and object attributes. We then inject the knowledge to 3D feature learning through three classification-based auxiliary tasks. This language-assisted training can be combined with modern object detection and instance segmentation methods to promote 3D semantic scene understanding, especially in a label-deficient regime. Moreover, the 3D feature learned with language assistance is better aligned with the language features, which can benefit various 3D-language multimodal tasks. Experiments on several benchmarks of 3D-only and 3D-language tasks demonstrate the effectiveness of our language-assisted 3D feature learning. Code is available at https://github.com/Asterisci/Language-Assisted-3D.

下载PDF全文

下载文献需遵守相关版权规定

论文标题