多任务学习的竞争简单性，可通过域名适应实时有雾的场景理解

论文标题

多任务学习的竞争简单性，可通过域名适应实时有雾的场景理解

Competitive Simplicity for Multi-Task Learning for Real-Time Foggy Scene Understanding via Domain Adaptation

论文作者

Alshammari, Naif, Akcay, Samet, Breckon, Toby P.

论文摘要

汽车场景在不利天气条件下的理解引起了一个现实且具有挑战性的问题，归因于室外差的可见性（例如雾气）。但是，由于大多数当代场景理解方法都是在理想天气条件下应用的，因此与对极端天气的先验见解相比，这种方法可能无法提供真正的最佳性能。在本文中，我们提出了一种复杂但竞争激烈的多任务学习方法，能够通过利用对抗性训练和域名适应的最新进展，在有雾天气条件下进行实时语义场景理解和单眼深度估计。作为端到端管道，我们的模型通过使用基于GAN的模型将场景从雾中转移到正常情况，提供了一种新的解决方案，以超越有雾天气的可见度。为了在语义细分中进行最佳性能，我们的模型生成了深度，以用作分割网络中RGB的互补源信息。我们通过同时训练两个模型（正常和雾气）和共同的权重（每个模型均在每个天气条件上训练了两个模型（独立训练），我们提供了一种强大的方法来理解有雾的场景。我们的模型通过具有密集的连接性和融合功能的不同编码器结合了RGB的颜色，深度和亮度图像，并利用跳过连接来产生一致的深度和分割预测。在推理时使用这种体系结构公式与光计算复杂性，我们能够在整体模型复杂性的一部分中实现与当代方法相当的性能。

Automotive scene understanding under adverse weather conditions raises a realistic and challenging problem attributable to poor outdoor scene visibility (e.g. foggy weather). However, because most contemporary scene understanding approaches are applied under ideal-weather conditions, such approaches may not provide genuinely optimal performance when compared to established a priori insights on extreme-weather understanding. In this paper, we propose a complex but competitive multi-task learning approach capable of performing in real-time semantic scene understanding and monocular depth estimation under foggy weather conditions by leveraging both recent advances in adversarial training and domain adaptation. As an end-to-end pipeline, our model provides a novel solution to surpass degraded visibility in foggy weather conditions by transferring scenes from foggy to normal using a GAN-based model. For optimal performance in semantic segmentation, our model generates depth to be used as complementary source information with RGB in the segmentation network. We provide a robust method for foggy scene understanding by training two models (normal and foggy) simultaneously with shared weights (each model is trained on each weather condition independently). Our model incorporates RGB colour, depth, and luminance images via distinct encoders with dense connectivity and features fusing, and leverages skip connections to produce consistent depth and segmentation predictions. Using this architectural formulation with light computational complexity at inference time, we are able to achieve comparable performance to contemporary approaches at a fraction of the overall model complexity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题