英国手语视频中弱监督的手指识别

论文标题

英国手语视频中弱监督的手指识别

Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

论文作者

Prajwal, K R, Bull, Hannah, Momeni, Liliane, Albanie, Samuel, Varol, Gül, Zisserman, Andrew

论文摘要

这项工作的目的是检测并识别用英国手语（BSL）签名的字母序列。以前的手指识别方法并未集中在BSL上，BSL的签名字母非常不同（例如，双手而不是单手）与美国手语（ASL）。他们还使用手动注释进行培训。与以前的方法相反，我们的方法仅使用字幕的弱注释进行培训。我们使用简单的特征相似性方法本地化手指的潜在实例，然后通过查询字幕单词并搜索签名者的相应介绍线索来自动注释这些实例。我们提出了一个适合此任务的变压器体系结构，具有多种假设CTC损失函数，以从替代注释的可能性中学习。我们采用一种多阶段训练方法，在该方法中，我们利用训练有素的模型的初始版本来扩展和增强培训数据，然后再重新训练以实现更好的性能。通过广泛的评估，我们验证了自动注释和模型体系结构的方法。此外，我们提供了一个人类专家注释的5K视频剪辑测试集，用于评估BSL手指识别方法以支持手语研究。

The goal of this work is to detect and recognize sequences of letters signed using fingerspelling in British Sign Language (BSL). Previous fingerspelling recognition methods have not focused on BSL, which has a very different signing alphabet (e.g., two-handed instead of one-handed) to American Sign Language (ASL). They also use manual annotations for training. In contrast to previous methods, our method only uses weak annotations from subtitles for training. We localize potential instances of fingerspelling using a simple feature similarity method, then automatically annotate these instances by querying subtitle words and searching for corresponding mouthing cues from the signer. We propose a Transformer architecture adapted to this task, with a multiple-hypothesis CTC loss function to learn from alternative annotation possibilities. We employ a multi-stage training approach, where we make use of an initial version of our trained model to extend and enhance our training data before re-training again to achieve better performance. Through extensive evaluations, we verify our method for automatic annotation and our model architecture. Moreover, we provide a human expert annotated test set of 5K video clips for evaluating BSL fingerspelling recognition methods to support sign language research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题