论文标题
通过寻求平坦的最小值来改善联邦学习的概括
Improving Generalization in Federated Learning by Seeking Flat Minima
论文作者
论文摘要
在联邦设置中接受培训的模型通常会遭受降解的表现,并且无法概括,尤其是在面对异质场景时。在这项工作中,我们通过损失的几何形状和Hessian特征光谱的镜头来研究这种行为,将模型缺乏概括能力与溶液的清晰度联系起来。通过先前的研究,将损失表面的清晰度和概括差距连接起来,我们表明i)在当地培训客户,以清晰度感知的最小化(SAM)或其适应性版本(ASAM)和II)平均在服务器端的随机重量(SWA)可以在服务器端平均可以实质性地改善联邦学习的通用性,并帮助跨越集中型模型,并帮助跨越集中的模型。通过在具有均匀损失均匀损失的社区中寻求参数,该模型会收敛于较小的最小值,其概括在同质和异质方案中都显着改善。经验结果证明了这些优化者在各种基准视觉数据集(例如CIFAR10/100,Landmarks-User-160K,IDDA)和任务(大规模分类,语义分割,域概括)中的有效性。
Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum, linking the model's lack of generalization capacity to the sharpness of the solution. Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) and ii) averaging stochastic weights (SWA) on the server-side can substantially improve generalization in Federated Learning and help bridging the gap with centralized models. By seeking parameters in neighborhoods having uniform low loss, the model converges towards flatter minima and its generalization significantly improves in both homogeneous and heterogeneous scenarios. Empirical results demonstrate the effectiveness of those optimizers across a variety of benchmark vision datasets (e.g. CIFAR10/100, Landmarks-User-160k, IDDA) and tasks (large scale classification, semantic segmentation, domain generalization).