论文标题
视频对象分割的内核内存网络
Kernelized Memory Network for Video Object Segmentation
论文作者
论文摘要
半监督视频对象分割(VOS)是一项任务,涉及在第一个帧中给出目标对象的地面真相分割掩码在视频中预测目标对象。最近,时空记忆网络(STM)已受到极大的关注,作为半监督VO的有前途的解决方案。但是,将STM应用于VO时会忽略一个重要的一点。解决方案(STM)是非本地的,但是问题(VOS)主要是本地的。为了解决STM和VO之间的不匹配,我们提出了一个内核内存网络(KMN)。在接受真实视频培训之前,我们的KMN已在静态图像上进行了预训练,就像以前的作品一样。与以前的工作不同,我们使用捉迷藏策略进行预训练,以在处理阻塞和细分边界提取方面获得最佳的结果。拟议的KMN超过了标准基准的最先进利润率(戴维斯(Davis)2017 Test-DEV集合为5%)。此外,与STM相比,KMN的运行时间为Davis 2016验证集的每帧0.12秒,KMN很少需要额外的计算。
Semi-supervised video object segmentation (VOS) is a task that involves predicting a target object in a video when the ground truth segmentation mask of the target object is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising solution for semi-supervised VOS. However, an important point is overlooked when applying STM to VOS. The solution (STM) is non-local, but the problem (VOS) is predominantly local. To solve the mismatch between STM and VOS, we propose a kernelized memory network (KMN). Before being trained on real videos, our KMN is pre-trained on static images, as in previous works. Unlike in previous works, we use the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction. The proposed KMN surpasses the state-of-the-art on standard benchmarks by a significant margin (+5% on DAVIS 2017 test-dev set). In addition, the runtime of KMN is 0.12 seconds per frame on the DAVIS 2016 validation set, and the KMN rarely requires extra computation, when compared with STM.