源内源样式增强，以改善域的概括

论文标题

源内源样式增强，以改善域的概括

Intra-Source Style Augmentation for Improved Domain Generalization

论文作者

Li, Yumeng, Zhang, Dan, Keuper, Margret, Khoreva, Anna

论文摘要

关于域转移的概括，由于它们经常出现在自动驾驶等应用中，这是深度学习模型的剩余挑战之一。因此，我们提出了一种源内源样式增强（ISSA）方法，以改善语义分割中的域概括。我们的方法基于用于stylegan2倒置的新型蒙版噪声编码器。该模型学会了通过噪声预测来忠实地重建图像，从而保留其语义布局。对估计噪声的随机掩蔽可以使我们的模型的样式混合能力，即它允许更改全局外观，而不会影响图像的语义布局。 ISSA使用拟议的蒙版噪声编码器在训练集中随机化样式和内容组合，有效地增加了训练数据的多样性并减少了虚假的相关性。结果，我们在不同类型的数据偏移（即改变地理位置，不利的天气条件和白天到晚上）下，在驾驶现场语义细分方面的驾驶语义细分方面提高了高达$ 12.4 \％$ $ $ $。 ISSA是模型不足的，可直接适用于CNN和变压器。它也与其他领域泛化技术互补，例如，它在CityScapes in CityScapes to DarkZürich中将最近的最新解决方案鲁棒Nobustnet提高了$ 3 \％$。

The generalization with respect to domain shifts, as they frequently appear in applications such as autonomous driving, is one of the remaining big challenges for deep learning models. Therefore, we propose an intra-source style augmentation (ISSA) method to improve domain generalization in semantic segmentation. Our method is based on a novel masked noise encoder for StyleGAN2 inversion. The model learns to faithfully reconstruct the image preserving its semantic layout through noise prediction. Random masking of the estimated noise enables the style mixing capability of our model, i.e. it allows to alter the global appearance without affecting the semantic layout of an image. Using the proposed masked noise encoder to randomize style and content combinations in the training set, ISSA effectively increases the diversity of training data and reduces spurious correlation. As a result, we achieve up to $12.4\%$ mIoU improvements on driving-scene semantic segmentation under different types of data shifts, i.e., changing geographic locations, adverse weather conditions, and day to night. ISSA is model-agnostic and straightforwardly applicable with CNNs and Transformers. It is also complementary to other domain generalization techniques, e.g., it improves the recent state-of-the-art solution RobustNet by $3\%$ mIoU in Cityscapes to Dark Zürich.

下载PDF全文

下载文献需遵守相关版权规定

论文标题