整体指导的解码器，用于深入表示学习，并应用于语义细分和对象检测

论文标题

整体指导的解码器，用于深入表示学习，并应用于语义细分和对象检测

A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection

论文作者

Liu, Jianbo, Ren, Sijie, Zheng, Yuanjie, Wang, Xiaogang, Li, Hongsheng

论文摘要

在各种视觉理解任务中，高级和高分辨率特征表示非常重要。为了获得具有高分位语义信息的高分辨率特征图，一种常见的策略是在骨干网络中采用扩张的卷积，以提取高分辨率特征图，例如基于扩张的FCN语义分割方法。但是，由于许多卷积操作是在高分辨率特征图上进行的，因此此类方法具有较大的计算复杂性和记忆消耗。在本文中，我们提出了一个新颖的整体指导解码器，该解码器是为了通过编码器的多尺度特征获取高分辨率语义富特征图。解码是通过新颖的整体编码版生成和代码字组装操作来实现的，这些操作从编码器功能中占据了高级和低级功能的优势。使用拟议的整体指导解码器，我们实现了用于语义分割的有效FCN架构，而HGD-FPN用于对象检测和实例分割。与最先进的方法相比，有效FCN的性能可比性甚至更好，其计算成本仅为1/3，用于在Pascal上下文，Pascal VOC，ADE20K数据集上进行语义细分。同时，建议的HGD-FPN成就$> 2 \％$的平均平均精度（MAP）时，将其集成到具有Resnet-50编码骨架的几个对象检测框架中。

Both high-level and high-resolution feature representations are of great importance in various visual understanding tasks. To acquire high-resolution feature maps with high-level semantic information, one common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps, such as the dilatedFCN-based methods for semantic segmentation. However, due to many convolution operations are conducted on the high-resolution feature maps, such methods have large computational complexity and memory consumption. In this paper, we propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding is achieved via novel holistic codeword generation and codeword assembly operations, which take advantages of both the high-level and low-level features from the encoder features. With the proposed holistically-guided decoder, we implement the EfficientFCN architecture for semantic segmentation and HGD-FPN for object detection and instance segmentation. The EfficientFCN achieves comparable or even better performance than state-of-the-art methods with only 1/3 of their computational costs for semantic segmentation on PASCAL Context, PASCAL VOC, ADE20K datasets. Meanwhile, the proposed HGD-FPN achieves $>2\%$ higher mean Average Precision (mAP) when integrated into several object detection frameworks with ResNet-50 encoding backbones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题