通过手语词典扩展标志点

论文标题

通过手语词典扩展标志点

Scaling up sign spotting through sign language dictionaries

论文作者

Varol, Gül, Momeni, Liliane, Albanie, Samuel, Afouras, Triantafyllos, Zisserman, Andrew

论文摘要

这项工作的重点是$ \ textIt {sign spotting} $ - 给定一个孤立的标志的视频，我们的任务是识别$ \ textit {} $和$ \ textit {where} $，它已在连续的，共同的共同的手语语言视频中签名。为了实现此标志斑点任务，我们使用多种可用的监督训练模型，作者：（1）$ \ textit {phow} $现有镜头，该镜头稀少地使用Mouthing Cues标记；（2）$ \ textIt {reading} $关联的字幕（随时可用签名内容的翻译），该字幕提供了其他$ \ textit {feffextit {feef-supervision} $; （3）$ \ textit {查找} $单词（在视觉手语词典中没有共共表明标记的示例）以启用新颖的标志斑点。这三个任务使用噪声对比估计和多个实例学习的原理将这三个任务集成到统一的学习框架中。我们验证方法在低射击标志斑点基准测试中的有效性。此外，我们还贡献了一个可读的英国手语（BSL）字典数据集的孤立符号BSLDICT，以促进研究此任务。数据集，模型和代码可在我们的项目页面上找到。

The focus of this work is $\textit{sign spotting}$ - given a video of an isolated sign, our task is to identify $\textit{whether}$ and $\textit{where}$ it has been signed in a continuous, co-articulated sign language video. To achieve this sign spotting task, we train a model using multiple types of available supervision by: (1) $\textit{watching}$ existing footage which is sparsely labelled using mouthing cues; (2) $\textit{reading}$ associated subtitles (readily available translations of the signed content) which provide additional $\textit{weak-supervision}$; (3) $\textit{looking up}$ words (for which no co-articulated labelled examples are available) in visual sign language dictionaries to enable novel sign spotting. These three tasks are integrated into a unified learning framework using the principles of Noise Contrastive Estimation and Multiple Instance Learning. We validate the effectiveness of our approach on low-shot sign spotting benchmarks. In addition, we contribute a machine-readable British Sign Language (BSL) dictionary dataset of isolated signs, BSLDict, to facilitate study of this task. The dataset, models and code are available at our project page.

下载PDF全文

下载文献需遵守相关版权规定

论文标题