PSNET：用于视频显着对象检测的平行对称网络

论文标题

PSNET：用于视频显着对象检测的平行对称网络

PSNet: Parallel Symmetric Network for Video Salient Object Detection

论文作者

Cong, Runmin, Song, Weiyu, Lei, Jianjun, Yue, Guanghui, Zhao, Yao, Kwong, Sam

论文摘要

对于视频显着对象检测（VSOD）任务，如何从外观方式中挖掘信息和运动方式一直是一个非常关注的话题。两流结构，包括RGB外观流和光流动流，已被广泛用作VSOD任务的典型管道，但是现有方法通常仅使用运动功能来单向指导外观特征或自适应地盲目融合了两种方式。但是，由于不理且不明确的学习方案，这些方法在各种情况下表现不佳。在本文中，遵循更安全的建模理念，我们以更全面的方式深入研究了外观方式和运动方式的重要性，并提出了带有上下平行对称性的VSOD网络，名为PSNET。设置了两个具有不同主导方式的平行分支，以通过聚集扩散加强（GDR）模块和跨模式的改进与补体（CRC）模块的合作来实现完整的视频显着解码。最后，我们使用重要的感知融合（IPF）模块根据在不同情况下的不同重要性融合两个并行分支的功能。四个数据集基准的实验表明，我们的方法实现了理想和竞争性的性能。

For the video salient object detection (VSOD) task, how to excavate the information from the appearance modality and the motion modality has always been a topic of great concern. The two-stream structure, including an RGB appearance stream and an optical flow motion stream, has been widely used as a typical pipeline for VSOD tasks, but the existing methods usually only use motion features to unidirectionally guide appearance features or adaptively but blindly fuse two modality features. However, these methods underperform in diverse scenarios due to the uncomprehensive and unspecific learning schemes. In this paper, following a more secure modeling philosophy, we deeply investigate the importance of appearance modality and motion modality in a more comprehensive way and propose a VSOD network with up and down parallel symmetry, named PSNet. Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding with the cooperation of the Gather Diffusion Reinforcement (GDR) module and Cross-modality Refinement and Complement (CRC) module. Finally, we use the Importance Perception Fusion (IPF) module to fuse the features from two parallel branches according to their different importance in different scenarios. Experiments on four dataset benchmarks demonstrate that our method achieves desirable and competitive performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题