KE-RCNN：将基于知识的推理统一为部分属性解析

论文标题

KE-RCNN：将基于知识的推理统一为部分属性解析

KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

论文作者

Wang, Xuanhan, Song, Jingkuan, Chen, Xiaojia, Cheng, Lechao, Gao, Lianli, Shen, Heng Tao

论文摘要

零件级别的属性解析是一项基本但具有挑战性的任务，它需要区域级的视觉理解以提供可解释的身体部位细节。大多数现有方法通过添加具有属性预测头的区域卷积神经网络（RCNN）来解决此问题，其中从本地零件框中确定了身体部位的属性。但是，具有极限视觉线索的本地零件框（即仅零件外观）会导致不满意的解析结果，因为身体部位的属性高度依赖于它们之间的全面关系。在本文中，我们建议通过利用丰富的知识（包括隐性知识）来识别嵌入的RCNN（KE-RCNN），以识别属性（例如，属性``gashiphip''的属性需要视觉/几何关系，需要视觉/几何关系的shirt-iphie关系）和明确的知识（例如，houdib of Attatrib''或``属于'''' ``''）。具体而言，KE-RCNN由两个新型组件，即基于隐式知识的编码器（IK-en）和基于知识的显式解码器（EK-DE）组成。前者旨在通过将部分的关系上下文编码到部分框中来增强零件级表示，而后者则建议通过有关\ textit {part-attribute}关系的先验知识的指导来解码属性。这样，KE-RCNN就是插件播放，可以集成到任何两阶段检测器中，例如attribute-rcnn，cascade-rcnn，基于HRNET的RCNN和基于Swintransformer的RCNN。在两个具有挑战性的基准上进行的广泛实验，例如Fashionpedia和Kinetics-TPS，证明了KE-RCNN的有效性和概括性。特别是，它比所有现有方法都取得了更高的进步，在时尚Pedia上达到了3％的AP，而动力学TPS的ACC占4％。

Part-level attribute parsing is a fundamental but challenging task, which requires the region-level visual understanding to provide explainable details of body parts. Most existing approaches address this problem by adding a regional convolutional neural network (RCNN) with an attribute prediction head to a two-stage detector, in which attributes of body parts are identified from local-wise part boxes. However, local-wise part boxes with limit visual clues (i.e., part appearance only) lead to unsatisfying parsing results, since attributes of body parts are highly dependent on comprehensive relations among them. In this article, we propose a Knowledge Embedded RCNN (KE-RCNN) to identify attributes by leveraging rich knowledges, including implicit knowledge (e.g., the attribute ``above-the-hip'' for a shirt requires visual/geometry relations of shirt-hip) and explicit knowledge (e.g., the part of ``shorts'' cannot have the attribute of ``hoodie'' or ``lining''). Specifically, the KE-RCNN consists of two novel components, i.e., Implicit Knowledge based Encoder (IK-En) and Explicit Knowledge based Decoder (EK-De). The former is designed to enhance part-level representation by encoding part-part relational contexts into part boxes, and the latter one is proposed to decode attributes with a guidance of prior knowledge about \textit{part-attribute} relations. In this way, the KE-RCNN is plug-and-play, which can be integrated into any two-stage detectors, e.g., Attribute-RCNN, Cascade-RCNN, HRNet based RCNN and SwinTransformer based RCNN. Extensive experiments conducted on two challenging benchmarks, e.g., Fashionpedia and Kinetics-TPS, demonstrate the effectiveness and generalizability of the KE-RCNN. In particular, it achieves higher improvements over all existing methods, reaching around 3% of AP on Fashionpedia and around 4% of Acc on Kinetics-TPS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题