一个自制的学习系统，用于在图表上随机步行在视频中检测对象检测

论文标题

一个自制的学习系统，用于在图表上随机步行在视频中检测对象检测

A Self-supervised Learning System for Object Detection in Videos Using Random Walks on Graphs

论文作者

Tan, Juntao, Song, Changkyu, Boularias, Abdeslam

论文摘要

本文介绍了一种新的自我监督系统，用于学习图像中的新颖和以前看不见的对象类别。所提出的系统接收到输入的几个未标记的包含各种对象的场景的视频。视频的框架使用深度信息将对象细分为对象，并沿每个视频跟踪段。然后，系统构造了一个加权图，该图基于它们所包含的对象之间的相似性连接序列。在自动重新将两个序列中的帧重新安排以对准对象的视点之后，通过使用通用视觉特征来测量两个对象序列之间的相似性。该图用于通过进行随机步行来示例类似和不同示例的三重态。三胞胎示例最终被用来训练一个暹罗神经网络，该网络将通用视觉特征投射到低维歧管中。在三个公共数据集（YCB-Video，core50和rgbd-object）上进行的实验表明，预计的低维特征提高了将未知对象分为新类别的准确性，并胜过了最近几种最近的未耐药聚类技术。

This paper presents a new self-supervised system for learning to detect novel and previously unseen categories of objects in images. The proposed system receives as input several unlabeled videos of scenes containing various objects. The frames of the videos are segmented into objects using depth information, and the segments are tracked along each video. The system then constructs a weighted graph that connects sequences based on the similarities between the objects that they contain. The similarity between two sequences of objects is measured by using generic visual features, after automatically re-arranging the frames in the two sequences to align the viewpoints of the objects. The graph is used to sample triplets of similar and dissimilar examples by performing random walks. The triplet examples are finally used to train a siamese neural network that projects the generic visual features into a low-dimensional manifold. Experiments on three public datasets, YCB-Video, CORe50 and RGBD-Object, show that the projected low-dimensional features improve the accuracy of clustering unknown objects into novel categories, and outperform several recent unsupervised clustering techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题