与隐式的时间对齐和配对相似性优化的几乎没有射击动作识别

论文标题

与隐式的时间对齐和配对相似性优化的几乎没有射击动作识别

Few-shot Action Recognition with Implicit Temporal Alignment and Pair Similarity Optimization

论文作者

Cao, Congqi, Li, Yajuan, Lv, Qinyi, Wang, Peng, Zhang, Yanning

论文摘要

很少有学习的旨在认识到几乎没有标记样品的新颖类中的实例，这些样本在研究和应用中具有巨大的价值。尽管最近在这一领域进行了很多工作，但大多数现有工作都是基于图像分类任务。基于视频的几次动作识别尚未得到很好的探索，并且仍然具有挑战性：1）不同论文之间实施细节的差异使得公平比较变得困难； 2）时间序列的广泛变化和未对准使视频级别的相似性比较变得困难； 3）标记数据的稀缺性使优化变得困难。为了解决这些问题，本文介绍了1）评估少数动作识别算法的性能的特定设置； 2）一种隐式序列对齐算法，用于更好的视频级相似性比较； 3）几次学习的高级损失，可以用有限的数据优化对的相似性。具体而言，我们提出了一个新颖的几弹性动作识别框架，该框架在3D卷积层之后使用长期的短期记忆进行序列建模和对齐。引入了圆损失，以最大程度地提高阶层内相似性，并最大程度地减少与更确定的收敛目标灵活地相似性。我们不使用随机或模棱两可的实验设置，而是设置了类似于标准图像基于图像的几弹性学习设置的具体标准，以进行几次射击动作识别评估。在两个数据集上进行的大量实验证明了我们提出的方法的有效性。

Few-shot learning aims to recognize instances from novel classes with few labeled samples, which has great value in research and application. Although there has been a lot of work in this area recently, most of the existing work is based on image classification tasks. Video-based few-shot action recognition has not been explored well and remains challenging: 1) the differences of implementation details among different papers make a fair comparison difficult; 2) the wide variations and misalignment of temporal sequences make the video-level similarity comparison difficult; 3) the scarcity of labeled data makes the optimization difficult. To solve these problems, this paper presents 1) a specific setting to evaluate the performance of few-shot action recognition algorithms; 2) an implicit sequence-alignment algorithm for better video-level similarity comparison; 3) an advanced loss for few-shot learning to optimize pair similarity with limited data. Specifically, we propose a novel few-shot action recognition framework that uses long short-term memory following 3D convolutional layers for sequence modeling and alignment. Circle loss is introduced to maximize the within-class similarity and minimize the between-class similarity flexibly towards a more definite convergence target. Instead of using random or ambiguous experimental settings, we set a concrete criterion analogous to the standard image-based few-shot learning setting for few-shot action recognition evaluation. Extensive experiments on two datasets demonstrate the effectiveness of our proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题