论文标题
实际适应行人探测器
Virtual to Real adaptation of Pedestrian Detectors
论文作者
论文摘要
通过计算机视觉检测的行人检测是许多应用程序的基础。最近,对执行此类任务的基于卷积神经网络的架构越来越兴趣。这些监督网络的关键目标之一是将培训阶段中学到的知识推广到具有不同特征的新场景。适当标记的数据集对于实现此目的至关重要。主要的问题是,手动注释数据集通常需要大量的人为努力,这是昂贵的。为此,我们介绍了Viped(Virtual Pactestrian DataSet),这是一种新的合成生成的图像集,该图像使用视频游戏GTA V -Grand Theft Auto V的高度逼真的图形引擎收集,其中注释是自动获取的。但是,当仅在合成数据集上训练时,该模型会经历合成2real域的转移,从而导致应用于现实世界图像时性能下降。为了减轻这一差距,我们提出了两种适合行人检测任务的不同域的适应技术,但可能适用于一般对象检测。实验表明,经过贵宾训练的网络可以比在现实世界数据中训练的检测器更好地概括看不到现实世界的场景,从而利用了我们的合成数据集的多样性。此外,我们证明,借助我们的域适应技术,我们可以减少合成2real域的移位,使两个域更接近,并在通过现实世界图像测试网络时获得性能提高。代码,模型和数据集可在https://ciampluca.github.io/viped/上自由使用。
Pedestrian detection through Computer Vision is a building block for a multitude of applications. Recently, there was an increasing interest in Convolutional Neural Network-based architectures for the execution of such a task. One of these supervised networks' critical goals is to generalize the knowledge learned during the training phase to new scenarios with different characteristics. A suitably labeled dataset is essential to achieve this purpose. The main problem is that manually annotating a dataset usually requires a lot of human effort, and it is costly. To this end, we introduce ViPeD (Virtual Pedestrian Dataset), a new synthetically generated set of images collected with the highly photo-realistic graphical engine of the video game GTA V - Grand Theft Auto V, where annotations are automatically acquired. However, when training solely on the synthetic dataset, the model experiences a Synthetic2Real Domain Shift leading to a performance drop when applied to real-world images. To mitigate this gap, we propose two different Domain Adaptation techniques suitable for the pedestrian detection task, but possibly applicable to general object detection. Experiments show that the network trained with ViPeD can generalize over unseen real-world scenarios better than the detector trained over real-world data, exploiting the variety of our synthetic dataset. Furthermore, we demonstrate that with our Domain Adaptation techniques, we can reduce the Synthetic2Real Domain Shift, making closer the two domains and obtaining a performance improvement when testing the network over the real-world images. The code, the models, and the dataset are made freely available at https://ciampluca.github.io/viped/