论文标题
在单个GPU上将语义细分缩放超过1K类
Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU
论文作者
论文摘要
最先进的对象检测和图像分类方法可以分别在9K和10K类中表现出色。相反,语义分割数据集中的类数相对有限。当考虑缺乏标记的数据和对分割的高计算需求引起的限制时,这并不奇怪。在本文中,我们提出了一种新颖的培训方法,以训练和扩展大量语义类别的现有语义分割模型,而不会增加内存开销。在我们基于嵌入的可扩展分割方法中,我们将分割模型从O(C)降低到O(1)的空间复杂性,提出了一种近似方法,以实现地面真相类概率,并使用它来计算交叉倾斜损失。所提出的方法是一般的,可以通过任何最新的分割模型采用,以优雅地将其用于任何数量的语义类别,只有一个GPU。我们的方法实现了类似的,在某些情况下,当采用具有不同骨架的DeepLabv3+模型时,在CityScapes,Pascal VOC,ADE20K,Coco-Stuff10K数据集时甚至更好的MIOU。我们在1284个类别的数据集上证明了我们的方法明显好处,该类别由LVI和可可注释进行了自举,MIOU的三倍是DeepLabV3+模型的三倍。
The state-of-the-art object detection and image classification methods can perform impressively on more than 9k and 10k classes, respectively. In contrast, the number of classes in semantic segmentation datasets is relatively limited. This is not surprising when the restrictions caused by the lack of labeled data and high computation demand for segmentation are considered. In this paper, we propose a novel training methodology to train and scale the existing semantic segmentation models for a large number of semantic classes without increasing the memory overhead. In our embedding-based scalable segmentation approach, we reduce the space complexity of the segmentation model's output from O(C) to O(1), propose an approximation method for ground-truth class probability, and use it to compute cross-entropy loss. The proposed approach is general and can be adopted by any state-of-the-art segmentation model to gracefully scale it for any number of semantic classes with only one GPU. Our approach achieves similar, and in some cases, even better mIoU for Cityscapes, Pascal VOC, ADE20k, COCO-Stuff10k datasets when adopted to DeeplabV3+ model with different backbones. We demonstrate a clear benefit of our approach on a dataset with 1284 classes, bootstrapped from LVIS and COCO annotations, with three times better mIoU than the DeeplabV3+ model.