组装视频的细粒度活动识别

论文标题

组装视频的细粒度活动识别

Fine-grained activity recognition for assembly videos

论文作者

Jones, Jonathan D., Cortesa, Cathryn, Shelton, Amy, Landau, Barbara, Khudanpur, Sanjeev, Hager, Gregory D.

论文摘要

在本文中，我们解决了将装配动作识别为结构（例如，家具或玩具块塔）的任务是由一组原始物体建立的。认识到各种组装行动需要在迄今尚未尝试的空间细节上进行感知。我们扩展了细粒度的活动识别设置，以通过在单个框架内统一装配操作和运动学结构来解决整个汇编操作识别的任务。我们使用此框架来开发一种通用方法来识别观察序列的组装动作，以及利用空间组装的特殊结构的观察特征。最后，我们在两个以应用程序驱动的数据来源进行经验评估我们的方法：（1）宜家家具组装数据集和（2）块构建数据集。首先，我们的系统以平均框架精度为70％，平均归一化编辑距离为10％，识别组装动作。在第二个需要细分几何推理以区分组件的情况下，我们的系统的平均归一化编辑距离为23％ - 相对提高了69％，比先前的工作。

In this paper we address the task of recognizing assembly actions as a structure (e.g. a piece of furniture or a toy block tower) is built up from a set of primitive objects. Recognizing the full range of assembly actions requires perception at a level of spatial detail that has not been attempted in the action recognition literature to date. We extend the fine-grained activity recognition setting to address the task of assembly action recognition in its full generality by unifying assembly actions and kinematic structures within a single framework. We use this framework to develop a general method for recognizing assembly actions from observation sequences, along with observation features that take advantage of a spatial assembly's special structure. Finally, we evaluate our method empirically on two application-driven data sources: (1) An IKEA furniture-assembly dataset, and (2) A block-building dataset. On the first, our system recognizes assembly actions with an average framewise accuracy of 70% and an average normalized edit distance of 10%. On the second, which requires fine-grained geometric reasoning to distinguish between assemblies, our system attains an average normalized edit distance of 23% -- a relative improvement of 69% over prior work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题