无监督视频域适应的内存有效的时间和视觉图模型

论文标题

无监督视频域适应的内存有效的时间和视觉图模型

Memory Efficient Temporal & Visual Graph Model for Unsupervised Video Domain Adaptation

论文作者

Hu, Xinyue, Gu, Lin, Liu, Liangchen, Li, Ruijiang, Su, Chang, Harada, Tatsuya, Zhu, Yingying

论文摘要

现有的视频域改编（DA）方法需要存储视频帧的所有时间组合或配对源和目标视频，这些视频和目标视频成本昂贵，无法扩展到长时间的视频。为了解决这些限制，我们建议采用以下记忆有效的基于图形的视频DA方法。首先，我们的方法模型每个源或目标视频通过图：节点表示视频帧和边缘表示帧之间的时间或视觉相似性关系。我们使用图形注意力网络来了解单个帧的重量，并同时将源和目标视频对齐到域不变的图形特征空间中。我们的方法没有存储大量的子视频，而是使用一个视频的图形注意机制构造一个图，从而大大降低了内存成本。广泛的实验表明，与最先进的方法相比，我们在降低内存成本的同时取得了卓越的性能。

Existing video domain adaption (DA) methods need to store all temporal combinations of video frames or pair the source and target videos, which are memory cost expensive and can't scale up to long videos. To address these limitations, we propose a memory-efficient graph-based video DA approach as follows. At first our method models each source or target video by a graph: nodes represent video frames and edges represent the temporal or visual similarity relationship between frames. We use a graph attention network to learn the weight of individual frames and simultaneously align the source and target video into a domain-invariant graph feature space. Instead of storing a large number of sub-videos, our method only constructs one graph with a graph attention mechanism for one video, reducing the memory cost substantially. The extensive experiments show that, compared with the state-of-art methods, we achieved superior performance while reducing the memory cost significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题