论文标题
修剪通过训练和正则化的镜头对概括的影响
Pruning's Effect on Generalization Through the Lens of Training and Regularization
论文作者
论文摘要
从业者经常观察到修剪会改善模型的概括。基于偏见变化权衡的长期假设将这种概括改进到模型尺寸减少。但是,关于过度参数化的最新研究表征了一种新的模型大小制度,其中较大的模型实现了更好的泛化。在这种过度参数化的制度中,修剪模型导致了矛盾 - 虽然理论预测,减少模型大小会损害概括,但要削减一系列稀疏性,但仍会改善它。在这一矛盾的激励下,我们从经验上重新检查了修剪对概括的影响。 我们表明,减小的尺寸不能完全解释标准修剪算法的概括改善效果。取而代之的是,我们发现修剪会导致在特定的稀疏性方面进行更好的训练,从而改善训练损失,而不是密集的模型。我们发现,修剪还会导致其他稀释度的额外正则化,从而降低了由于密集模型上嘈杂的示例而导致的准确性降解。修剪会延长模型训练时间并减少模型大小。这两个因素分别改善训练并增加正则化。我们从经验上证明,这两个因素对于充分解释修剪对概括的影响至关重要。
Practitioners frequently observe that pruning improves model generalization. A long-standing hypothesis based on bias-variance trade-off attributes this generalization improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. Pruning models in this over-parameterized regime leads to a contradiction -- while theory predicts that reducing model size harms generalization, pruning to a range of sparsities nonetheless improves it. Motivated by this contradiction, we re-examine pruning's effect on generalization empirically. We show that size reduction cannot fully account for the generalization-improving effect of standard pruning algorithms. Instead, we find that pruning leads to better training at specific sparsities, improving the training loss over the dense model. We find that pruning also leads to additional regularization at other sparsities, reducing the accuracy degradation due to noisy examples over the dense model. Pruning extends model training time and reduces model size. These two factors improve training and add regularization respectively. We empirically demonstrate that both factors are essential to fully explaining pruning's impact on generalization.