论文标题
视频对比学习的概率表示
Probabilistic Representations for Video Contrastive Learning
论文作者
论文摘要
本文介绍了一种概率的视频对比学习,这是一种自我监督的表示方法,将对比度学习与概率表示。我们假设组成视频的剪辑在短期持续时间内具有不同的分布,但可以通过在通用嵌入空间中的组合来代表复杂而复杂的视频分布。因此,提出的方法表示视频剪辑作为正常分布,并将它们结合到高斯人的混合物中,以模拟整个视频分布。通过对整个视频分布进行采样嵌入,我们可以规避仔细的采样策略或转换以产生剪辑的增强视图,这与以前的确定性方法不同,这些方法主要集中在此类样本生成策略上的对比学习策略上。我们进一步提出了一个随机的对比损失,以学习适当的视频分布,并从原始视频的性质中处理固有的不确定性。实验结果验证了我们的概率嵌入是在最流行的基准测试(包括UCF101和HMDB51)上进行动作识别和视频检索的最先进的视频表示学习。
This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic representation. We hypothesize that the clips composing the video have different distributions in short-term duration, but can represent the complicated and sophisticated video distribution through combination in a common embedding space. Thus, the proposed method represents video clips as normal distributions and combines them into a Mixture of Gaussians to model the whole video distribution. By sampling embeddings from the whole video distribution, we can circumvent the careful sampling strategy or transformations to generate augmented views of the clips, unlike previous deterministic methods that have mainly focused on such sample generation strategies for contrastive learning. We further propose a stochastic contrastive loss to learn proper video distributions and handle the inherent uncertainty from the nature of the raw video. Experimental results verify that our probabilistic embedding stands as a state-of-the-art video representation learning for action recognition and video retrieval on the most popular benchmarks, including UCF101 and HMDB51.