混合动态静态上下文感知的注意力网络，用于长时间视频中的动作评估

论文标题

混合动态静态上下文感知的注意力网络，用于长时间视频中的动作评估

Hybrid Dynamic-static Context-aware Attention Network for Action Assessment in Long Videos

论文作者

Zeng, Ling-An, Hong, Fa-Ting, Zheng, Wei-Shi, Yu, Qi-Zhi, Zeng, Wei, Wang, Yao-Wei, Lai, Jian-Huang

论文摘要

行动质量评估的目的是为体育视频评分。但是，大多数现有作品仅着眼于视频动态信息（即运动信息），但忽略了运动员在视频中执行的特定姿势，这对于长时间视频中的动作评估很重要。在这项工作中，我们提出了一种新型的混合动态静态环境 - 感知注意力网络（Action-NET），用于长时间视频中的动作评估。为了了解视频的更多判别性表示，我们不仅学习了视频动态信息，而且还专注于特定框架中检测到的运动员的静态姿势，这些姿势在某些时刻代表了动作质量，并借助拟议的混合动态静态静态体系结构。此外，我们利用了一个环境感知的注意模块，该模块由时间实例的图形卷积网络单元和两个流的注意力单元组成，以提取更强大的流功能，其中前者用于探索实例和后者之间的关系，以分配适当的权重。最后，我们结合了这两个流的功能，以回归最终视频得分，该视频得分由专家给出的地面真实分数监督。此外，我们已经收集并注释了新的节奏体操数据集，其中包含四种不同类型的体操例程的视频，以评估长视频中的动作质量评估。广泛的实验结果验证了我们提出的方法的功效，该方法优于相关方法。这些代码和数据集可在\ url {https://github.com/lingan1996/action-net}上获得。

The objective of action quality assessment is to score sports videos. However, most existing works focus only on video dynamic information (i.e., motion information) but ignore the specific postures that an athlete is performing in a video, which is important for action assessment in long videos. In this work, we present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos. To learn more discriminative representations for videos, we not only learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames, which represent the action quality at certain moments, along with the help of the proposed hybrid dynamic-static architecture. Moreover, we leverage a context-aware attention module consisting of a temporal instance-wise graph convolutional network unit and an attention unit for both streams to extract more robust stream features, where the former is for exploring the relations between instances and the latter for assigning a proper weight to each instance. Finally, we combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts. Additionally, we have collected and annotated the new Rhythmic Gymnastics dataset, which contains videos of four different types of gymnastics routines, for evaluation of action quality assessment in long videos. Extensive experimental results validate the efficacy of our proposed method, which outperforms related approaches. The codes and dataset are available at \url{https://github.com/lingan1996/ACTION-NET}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题