论文标题
从外向的视图中扎根的负担
Grounded Affordance from Exocentric View
论文作者
论文摘要
负担得起的基础旨在定位物体的“行动可能性”区域,这是迈向体现情报的重要一步。由于交互式负担的多样性,不同个体的独特性会导致不同的互动,这使得很难在对象零件和负担能力标签之间建立明确的联系。人的能力将各种自以为是的相互作用转化为不变的以自负的负担来应对互动多样性的影响。为了使代理具有这种能力,本文提出了一项负担得起的任务,即从外主观点(即给定的中心化的人类对象的互动和以egpocentric对象图像),了解对象的负担性知识并将其传输到中心图像中,仅使用负担得起的标签作为监督。但是,角色之间存在一些“互动偏见”,主要是关于不同的区域和不同的观点。为此,我们设计了一个跨视图的负担得起的知识转移框架,该框架从Exentric互动中提取了特定于负担的特定功能,并将其转移到以自我为中心的视图中。具体而言,通过保留负担共同关系来增强对负担区域的看法。此外,一个名为AGD20K的负担得起的基础数据集是通过收集和标记20k $ 36 $负担能力类别的20k图像来构建的。实验结果表明,我们的方法优于有关客观指标和视觉质量的代表性模型。代码在https://github.com/lhc1224/cross-view-affordance-grounding上发布。
Affordance grounding aims to locate objects' "action possibilities" regions, which is an essential step toward embodied intelligence. Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels. Human has the ability that transforms the various exocentric interactions into invariant egocentric affordance to counter the impact of interactive diversity. To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i.e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision. However, there is some "interaction bias" between personas, mainly regarding different regions and different views. To this end, we devise a cross-view affordance knowledge transfer framework that extracts affordance-specific features from exocentric interactions and transfers them to the egocentric view. Specifically, the perception of affordance regions is enhanced by preserving affordance co-relations. In addition, an affordance grounding dataset named AGD20K is constructed by collecting and labeling over 20K images from $36$ affordance categories. Experimental results demonstrate that our method outperforms the representative models regarding objective metrics and visual quality. Code is released at https://github.com/lhc1224/Cross-view-affordance-grounding.