未经修剪的行动预期

论文标题

未经修剪的行动预期

Untrimmed Action Anticipation

论文作者

Rodin, Ivan, Furnari, Antonino, Mavroeidis, Dimitrios, Farinella, Giovanni Maria

论文摘要

以自我为中心的动作预期包括预测摄像头佩戴者将从以自我为中心视频中执行的未来动作。尽管该任务最近引起了研究社区的注意，但当前的方法假设输入视频已“修剪”，这意味着在动作开始之前，将简短的视频序列采样了固定的时间。我们认为，尽管该领域的最新进展，但修剪的动作预期在现实情况下的适用性有限，在现实世界中，处理“未修剪”的视频输入很重要，并且不能假定在测试时已知动作开始的确切时刻。为了克服此类局限性，我们提出了一项未修剪的动作预期任务，与时间动作检测相似，假设输入视频在测试时间没有修剪，同时仍需要在操作实际发生之前进行预测。我们为旨在解决这一新任务的方法设计了一个评估程序，并比较了Epic-Kitchens-100数据集上的几个基线。实验表明，为修剪动作预期设计的当前模型的性能非常有限，需要对此任务进行更多研究。

Egocentric action anticipation consists in predicting a future action the camera wearer will perform from egocentric video. While the task has recently attracted the attention of the research community, current approaches assume that the input videos are "trimmed", meaning that a short video sequence is sampled a fixed time before the beginning of the action. We argue that, despite the recent advances in the field, trimmed action anticipation has a limited applicability in real-world scenarios where it is important to deal with "untrimmed" video inputs and it cannot be assumed that the exact moment in which the action will begin is known at test time. To overcome such limitations, we propose an untrimmed action anticipation task, which, similarly to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before the actions actually take place. We design an evaluation procedure for methods designed to address this novel task, and compare several baselines on the EPIC-KITCHENS-100 dataset. Experiments show that the performance of current models designed for trimmed action anticipation is very limited and more research on this task is required.

下载PDF全文

下载文献需遵守相关版权规定

论文标题