荆棘：行动识别的时间人类对象关系网络

论文标题

荆棘：行动识别的时间人类对象关系网络

THORN: Temporal Human-Object Relation Network for Action Recognition

论文作者

Guermal, Mohammed, Dai, Rui, Bremond, Francois

论文摘要

大多数行动识别模型将人类活动视为统一事件。但是，人类活动通常遵循一定的层次结构。实际上，许多人类活动都是组成的。而且，这些动作主要是人类对象的相互作用。在本文中，我们建议通过利用定义行动的一组相互作用来识别人类的行动。在这项工作中，我们提出了一个端到端网络：刺，可以利用重要的人类对象和对象对象相互作用来预测动作。该模型建立在3D骨干网络之上。我们模型的关键组件是：1）用于建模对象的对象表示过滤器。 2）对象关系推理模块以捕获对象关系。 3）一个分类层来预测动作标签。为了显示刺的鲁棒性，我们在Epic-Kitchen55和Egtea Gaze+上评估了它，这是两个最大，最具挑战性的第一人称和人类对象交互数据集。索恩在两个数据集上都达到了最先进的性能。

Most action recognition models treat human activities as unitary events. However, human activities often follow a certain hierarchy. In fact, many human activities are compositional. Also, these actions are mostly human-object interactions. In this paper we propose to recognize human action by leveraging the set of interactions that define an action. In this work, we present an end-to-end network: THORN, that can leverage important human-object and object-object interactions to predict actions. This model is built on top of a 3D backbone network. The key components of our model are: 1) An object representation filter for modeling object. 2) An object relation reasoning module to capture object relations. 3) A classification layer to predict the action labels. To show the robustness of THORN, we evaluate it on EPIC-Kitchen55 and EGTEA Gaze+, two of the largest and most challenging first-person and human-object interaction datasets. THORN achieves state-of-the-art performance on both datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题