论文标题
Wair-D:无线AI研究数据集
WAIR-D: Wireless AI Research Dataset
论文作者
论文摘要
具有高质量数据样本的数据集在人工智能(AI),机器学习(ML)和相关研究中起着重要作用,这是一种常识。但是,尽管很久以前在无线研究中引入了AI/ML,但在研究社区中很少使用数据集。没有通用数据集,针对无线系统提出的基于AI的方法与传统基线甚至彼此之间都很难进行比较。现有的无线AI研究通常依赖于基于统计模型或具有有限环境的射线追踪模拟生成的数据集。统计数据阻碍了训练有素的AI模型,无法进一步调整特定方案,而对环境有限的射线追踪数据则降低了训练有素的AI模型的概括能力。在本文中,我们介绍了无线AI研究数据集(WAIR-D)1,该数据集由两种情况组成。方案1包含10,000个环境,具有稀疏的用户设备(UES),而方案2包含100个环境,其密集掉落的UE。这些环境是从现实世界地图中40多个城市随机挑选的。大量数据保证了受过训练的AI模型具有良好的概括能力,而在特定选择的环境中可以轻松进行微调。此外,Wair-D中提供了无线通道和相应的环境信息,因此可以设计和评估额外信息的通信机制。 Wair-D提供了研究人员的基准,以比较其不同的设计或再现他人的结果。在本文中,我们显示了该数据集的详细构造以及使用它的示例。
It is a common sense that datasets with high-quality data samples play an important role in artificial intelligence (AI), machine learning (ML) and related studies. However, although AI/ML has been introduced in wireless researches long time ago, few datasets are commonly used in the research community. Without a common dataset, AI-based methods proposed for wireless systems are hard to compare with both the traditional baselines and even each other. The existing wireless AI researches usually rely on datasets generated based on statistical models or ray-tracing simulations with limited environments. The statistical data hinder the trained AI models from further fine-tuning for a specific scenario, and ray-tracing data with limited environments lower down the generalization capability of the trained AI models. In this paper, we present the Wireless AI Research Dataset (WAIR-D)1, which consists of two scenarios. Scenario 1 contains 10,000 environments with sparsely dropped user equipments (UEs), and Scenario 2 contains 100 environments with densely dropped UEs. The environments are randomly picked up from more than 40 cities in the real world map. The large volume of the data guarantees that the trained AI models enjoy good generalization capability, while fine-tuning can be easily carried out on a specific chosen environment. Moreover, both the wireless channels and the corresponding environmental information are provided in WAIR-D, so that extra-information-aided communication mechanism can be designed and evaluated. WAIR-D provides the researchers benchmarks to compare their different designs or reproduce results of others. In this paper, we show the detailed construction of this dataset and examples of using it.