论文标题
深层解析模型的多层功能聚合
Multi-layer Feature Aggregation for Deep Scene Parsing Models
论文作者
论文摘要
从图像中解析的场景是视觉内容理解中的一个基本但充满挑战的问题。在此密集的预测任务中,解析模型将每个像素分配给一个分类标签,这需要相邻图像补丁的上下文信息。因此,这项学习任务的挑战是同时描述对象或场景的几何和语义属性。在本文中,我们通过设计一种新型特征聚合模块来生成适当的全局表示形式,以提高特征的歧视性功能,从而探讨了深层解析网络的多层特征输出的有效利用。所提出的模块可以自动选择中间视觉特征,以将空间和语义信息相关联。同时,多个跳过连接形成了强大的监督,使深度解析网络易于训练。在四个公共场景解析数据集上进行的大量实验证明,配备了建议的功能聚合模块的深层解析网络可以实现非常有希望的结果。
Scene parsing from images is a fundamental yet challenging problem in visual content understanding. In this dense prediction task, the parsing model assigns every pixel to a categorical label, which requires the contextual information of adjacent image patches. So the challenge for this learning task is to simultaneously describe the geometric and semantic properties of objects or a scene. In this paper, we explore the effective use of multi-layer feature outputs of the deep parsing networks for spatial-semantic consistency by designing a novel feature aggregation module to generate the appropriate global representation prior, to improve the discriminative power of features. The proposed module can auto-select the intermediate visual features to correlate the spatial and semantic information. At the same time, the multiple skip connections form a strong supervision, making the deep parsing network easy to train. Extensive experiments on four public scene parsing datasets prove that the deep parsing network equipped with the proposed feature aggregation module can achieve very promising results.