Waymo打开数据集：全景视频全景分割

论文标题

Waymo打开数据集：全景视频全景分割

Waymo Open Dataset: Panoramic Video Panoptic Segmentation

论文作者

Mei, Jieru, Zhu, Alex Zihao, Yan, Xinchen, Yan, Hang, Qiao, Siyuan, Zhu, Yukun, Chen, Liang-Chieh, Kretzschmar, Henrik, Anguelov, Dragomir

论文摘要

Panoptic图像分割是计算机视觉任务，即在图像中查找像素的组，并为其分配语义类别和对象实例标识符。由于其在机器人技术和自动驾驶中的关键应用，图像细分的研究变得越来越流行。因此，研究社区依靠公开可用的基准数据集来推动计算机视觉中的最新技术。但是，由于将图像标记的高昂成本很高，因此缺乏适合全景分段的公开地面真相标签。高标签成本还使将现有数据集扩展到视频域和多相机设置是一项挑战。因此，我们介绍了Waymo Open DataSet：全景视频PANOPTIC分割数据集，这是一个大型数据集，提供用于自动驾驶的高质量的全景分割标签。我们使用公开的Waymo打开数据集生成数据集，利用各种相机图像集。随着时间的推移，我们的标签是一致的视频处理，并且在车辆上安装的多个摄像机上保持一致，以了解全景的理解。具体而言，我们为28个语义类别和2,860个时间序列提供标签，这些标签由在三个不同地理位置驾驶的自动驾驶汽车上安装的五个摄像机捕获，从而导致总共标记为100k标记的相机图像。据我们所知，这使我们的数据集比提供视频全景分割标签的现有数据集大的数量级。我们进一步提出了一个新的基准，用于全景视频全景分割，并根据DeepLab模型家族建立许多强大的基准。我们将公开制作基准和代码。在https://waymo.com/open上找到数据集。

Panoptic image segmentation is the computer vision task of finding groups of pixels in an image and assigning semantic classes and object instance identifiers to them. Research in image segmentation has become increasingly popular due to its critical applications in robotics and autonomous driving. The research community thereby relies on publicly available benchmark dataset to advance the state-of-the-art in computer vision. Due to the high costs of densely labeling the images, however, there is a shortage of publicly available ground truth labels that are suitable for panoptic segmentation. The high labeling costs also make it challenging to extend existing datasets to the video domain and to multi-camera setups. We therefore present the Waymo Open Dataset: Panoramic Video Panoptic Segmentation Dataset, a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving. We generate our dataset using the publicly available Waymo Open Dataset, leveraging the diverse set of camera images. Our labels are consistent over time for video processing and consistent across multiple cameras mounted on the vehicles for full panoramic scene understanding. Specifically, we offer labels for 28 semantic categories and 2,860 temporal sequences that were captured by five cameras mounted on autonomous vehicles driving in three different geographical locations, leading to a total of 100k labeled camera images. To the best of our knowledge, this makes our dataset an order of magnitude larger than existing datasets that offer video panoptic segmentation labels. We further propose a new benchmark for Panoramic Video Panoptic Segmentation and establish a number of strong baselines based on the DeepLab family of models. We will make the benchmark and the code publicly available. Find the dataset at https://waymo.com/open.

下载PDF全文

下载文献需遵守相关版权规定

论文标题