论文标题
荆棘:行动识别的时间人类对象关系网络
THORN: Temporal Human-Object Relation Network for Action Recognition
论文作者
论文摘要
大多数行动识别模型将人类活动视为统一事件。但是,人类活动通常遵循一定的层次结构。实际上,许多人类活动都是组成的。而且,这些动作主要是人类对象的相互作用。在本文中,我们建议通过利用定义行动的一组相互作用来识别人类的行动。在这项工作中,我们提出了一个端到端网络:刺,可以利用重要的人类对象和对象对象相互作用来预测动作。该模型建立在3D骨干网络之上。我们模型的关键组件是:1)用于建模对象的对象表示过滤器。 2)对象关系推理模块以捕获对象关系。 3)一个分类层来预测动作标签。为了显示刺的鲁棒性,我们在Epic-Kitchen55和Egtea Gaze+上评估了它,这是两个最大,最具挑战性的第一人称和人类对象交互数据集。索恩在两个数据集上都达到了最先进的性能。
Most action recognition models treat human activities as unitary events. However, human activities often follow a certain hierarchy. In fact, many human activities are compositional. Also, these actions are mostly human-object interactions. In this paper we propose to recognize human action by leveraging the set of interactions that define an action. In this work, we present an end-to-end network: THORN, that can leverage important human-object and object-object interactions to predict actions. This model is built on top of a 3D backbone network. The key components of our model are: 1) An object representation filter for modeling object. 2) An object relation reasoning module to capture object relations. 3) A classification layer to predict the action labels. To show the robustness of THORN, we evaluate it on EPIC-Kitchen55 and EGTEA Gaze+, two of the largest and most challenging first-person and human-object interaction datasets. THORN achieves state-of-the-art performance on both datasets.