论文标题
可扩展的模块化综合数据生成,用于推进空中自主权
Scalable Modular Synthetic Data Generation for Advancing Aerial Autonomy
论文作者
论文摘要
推进空中自治的一个主要障碍是为训练机器学习模型收集大规模的空中数据集。由于通过部署无人机来收集昂贵且耗时的现实世界数据,因此在无人机应用程序中使用合成数据越来越多地转向使用合成数据。但是,为了增加广泛的概括并将模型转移到现实世界中,已证明,增加了模拟环境的多样性来训练模型,并增强培训数据,这是必不可少的。当前的合成空中数据生成工具要么缺乏数据扩展,要么严重依赖手动工作量或真实样本来配置和生成多种现实的模拟场景,以收集数据。这些依赖关系限制了数据生成工作流的可伸缩性。因此,在合成数据生成中平衡可推广性和可伸缩性方面存在着重大挑战。为了解决这些差距,我们引入了可扩展的空中合成数据增强(ASDA)框架,该框架是针对空中自治应用定制的。 Asda扩展了一个中央数据收集引擎,并使用两个可脚本的管道来自动执行场景和数据增强,以生成用于不同培训任务的各种空中数据集。 ASDA通过在集成管道上提供基于统一的及时界面来提高数据生成工作流程效率,以进行灵活控制。我们数据增强的程序生成方法的性能是表现的,并且可以适应不同的仿真环境,培训任务和数据收集需求。我们证明了我们方法在自动生成各种数据集中的有效性,并展示了其下游性能优化的潜力。
One major barrier to advancing aerial autonomy has been collecting large-scale aerial datasets for training machine learning models. Due to costly and time-consuming real-world data collection through deploying drones, there has been an increasing shift towards using synthetic data for training models in drone applications. However, to increase widespread generalization and transferring models to real-world, increasing the diversity of simulation environments to train a model over all the varieties and augmenting the training data, has been proved to be essential. Current synthetic aerial data generation tools either lack data augmentation or rely heavily on manual workload or real samples for configuring and generating diverse realistic simulation scenes for data collection. These dependencies limit scalability of the data generation workflow. Accordingly, there is a major challenge in balancing generalizability and scalability in synthetic data generation. To address these gaps, we introduce a scalable Aerial Synthetic Data Augmentation (ASDA) framework tailored to aerial autonomy applications. ASDA extends a central data collection engine with two scriptable pipelines that automatically perform scene and data augmentations to generate diverse aerial datasets for different training tasks. ASDA improves data generation workflow efficiency by providing a unified prompt-based interface over integrated pipelines for flexible control. The procedural generative approach of our data augmentation is performant and adaptable to different simulation environments, training tasks and data collection needs. We demonstrate the effectiveness of our method in automatically generating diverse datasets and show its potential for downstream performance optimization.