论文标题
量化标签噪声对联合学习的影响
Quantifying the Impact of Label Noise on Federated Learning
论文作者
论文摘要
联合学习(FL)是一个分布式的机器学习范式,客户使用其本地(人类生成的)数据集进行协作训练模型。尽管现有的研究着重于FL算法开发以应对客户跨客户的数据异质性,但忽略了数据质量的重要问题(例如,标签噪声)。本文旨在通过提供有关标签噪声对FL的影响的定量研究来填补这一空白。我们得出了在客户端标签噪声水平上线性的概括误差的上限。然后,我们使用各种FL算法对MNIST和CIFAR-10数据集进行实验。我们的经验结果表明,随着噪声水平的增加,全局模型精度线性降低,这与我们的理论分析一致。我们进一步发现,标签噪声会减慢FL训练的收敛性,并且在噪声水平较高时,全局模型倾向于过度合适。
Federated Learning (FL) is a distributed machine learning paradigm where clients collaboratively train a model using their local (human-generated) datasets. While existing studies focus on FL algorithm development to tackle data heterogeneity across clients, the important issue of data quality (e.g., label noise) in FL is overlooked. This paper aims to fill this gap by providing a quantitative study on the impact of label noise on FL. We derive an upper bound for the generalization error that is linear in the clients' label noise level. Then we conduct experiments on MNIST and CIFAR-10 datasets using various FL algorithms. Our empirical results show that the global model accuracy linearly decreases as the noise level increases, which is consistent with our theoretical analysis. We further find that label noise slows down the convergence of FL training, and the global model tends to overfit when the noise level is high.