论文标题
PMVO:基于像素级匹配的视频对象分割
PMVOS: Pixel-Level Matching-Based Video Object Segmentation
论文作者
论文摘要
当提供初始框架的地面真实分割掩码时,半监督视频对象分割(VOS)的目的是在视频中进行任意目标对象。由于使用有关目标对象的先验知识的局限性,功能匹配(将代表目标对象与输入功能的模板特征与输入功能进行比较)是必不可少的步骤。最近,与模板功能和输入功能中每个像素匹配的像素级匹配(PM),由于其高性能,已广泛用于功能匹配。但是,尽管具有有效性,但用于构建模板功能的信息仅限于初始框架和以前的帧。我们通过提出一个基于方法PM的新型视频对象分割(PMVO)来解决此问题,该视频对象分割(PMVO)构建了包含所有过去框架信息的强模板功能。此外,我们将自我注意力应用于PM产生的相似图以捕获全球依赖性。在戴维斯(Davis)2016验证集中,我们在实时方法(> 30 fps)中实现了新的最新性能,J&F得分为85.6%。戴维斯(Davis)2017和YouTube-VOS验证集的性能也令人印象深刻,J&F得分分别为74.0%和68.2%。
Semi-supervised video object segmentation (VOS) aims to segment arbitrary target objects in video when the ground truth segmentation mask of the initial frame is provided. Due to this limitation of using prior knowledge about the target object, feature matching, which compares template features representing the target object with input features, is an essential step. Recently, pixel-level matching (PM), which matches every pixel in template features and input features, has been widely used for feature matching because of its high performance. However, despite its effectiveness, the information used to build the template features is limited to the initial and previous frames. We address this issue by proposing a novel method-PM-based video object segmentation (PMVOS)-that constructs strong template features containing the information of all past frames. Furthermore, we apply self-attention to the similarity maps generated from PM to capture global dependencies. On the DAVIS 2016 validation set, we achieve new state-of-the-art performance among real-time methods (> 30 fps), with a J&F score of 85.6%. Performance on the DAVIS 2017 and YouTube-VOS validation sets is also impressive, with J&F scores of 74.0% and 68.2%, respectively.