论文标题
在异质数据和计算环境中加速联合学习
Accelerating Federated Learning in Heterogeneous Data and Computational Environments
论文作者
论文摘要
在某些情况下,与机器学习问题相关的数据分布在多个位置,这些位置由于监管,竞争力或隐私原因而无法共享数据。例如,用户手机中存在的数据,在给定工业部门中的公司制造数据或位于不同医院的医疗记录。此外,参与站点通常具有不同的数据分布和计算功能。联合学习提供了一种在这些环境中所有可用数据的联合模型的方法。在本文中,我们介绍了一种新颖的分布式验证加权方案(DVW),该方案评估了联邦学习者在分布式验证集中的表现。每个学习者都保留其本地培训示例的一小部分(例如5%)作为验证数据集,并允许对其进行评估其他学习者模型。我们从经验上表明,与已建立的方法(例如FedAvg)相比,DVW在数据和计算异质环境中的同步和异步通信方案下都能提高性能。
There are situations where data relevant to a machine learning problem are distributed among multiple locations that cannot share the data due to regulatory, competitiveness, or privacy reasons. For example, data present in users' cellphones, manufacturing data of companies in a given industrial sector, or medical records located at different hospitals. Moreover, participating sites often have different data distributions and computational capabilities. Federated Learning provides an approach to learn a joint model over all the available data in these environments. In this paper, we introduce a novel distributed validation weighting scheme (DVW), which evaluates the performance of a learner in the federation against a distributed validation set. Each learner reserves a small portion (e.g., 5%) of its local training examples as a validation dataset and allows other learners models to be evaluated against it. We empirically show that DVW results in better performance compared to established methods, such as FedAvg, both under synchronous and asynchronous communication protocols in data and computationally heterogeneous environments.