良好的子网络证明存在：通过贪婪的前进选择修剪

论文标题

良好的子网络证明存在：通过贪婪的前进选择修剪

Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

论文作者

Ye, Mao, Gong, Chengyue, Nie, Lizhen, Zhou, Denny, Klivans, Adam, Liu, Qiang

论文摘要

最近的经验作品表明，大型深神经网络通常是高度冗余的，并且可以发现较小的子网，而无需大量准确性。但是，大多数现有的网络修剪方法都是经验和启发式方法，无论是良好的子网，如何有效地找到它们，并且是否可以证明网络修剪比使用梯度下降的直接培训更好。我们通过提出一种简单的贪婪选择方法来寻找良好的子网，从而从一个空的网络开始，并从大型网络中添加了重要的神经元，从而积极回答这些问题。这与基于向后消除的现有方法不同，后者从大网络中删除冗余神经元。从理论上讲，在足够大的{预训练}网络上应用贪婪的选择策略可以保证找到损失损失的小子网络，其损失低于直接训练梯度下降的网络。我们的结果还适用于修剪随机加权网络。实际上，我们改善了在Imagenet上学习紧凑的神经体系结构（包括Resnet，MobilenetV2/V3和ProxylessNet）的紧凑神经体系结构方面的先前修剪艺术。我们对Mobilenet的理论和经验结果表明，我们应该微调修剪的子网，以利用大型模型中的信息，而不是从\ citet {liu2018Rethinking}中建议的新的随机初始化中重新训练。

Recent empirical works show that large deep neural networks are often highly redundant and one can find much smaller subnetworks without a significant drop of accuracy. However, most existing methods of network pruning are empirical and heuristic, leaving it open whether good subnetworks provably exist, how to find them efficiently, and if network pruning can be provably better than direct training using gradient descent. We answer these problems positively by proposing a simple greedy selection approach for finding good subnetworks, which starts from an empty network and greedily adds important neurons from the large network. This differs from the existing methods based on backward elimination, which remove redundant neurons from the large network. Theoretically, applying the greedy selection strategy on sufficiently large {pre-trained} networks guarantees to find small subnetworks with lower loss than networks directly trained with gradient descent. Our results also apply to pruning randomly weighted networks. Practically, we improve prior arts of network pruning on learning compact neural architectures on ImageNet, including ResNet, MobilenetV2/V3, and ProxylessNet. Our theory and empirical results on MobileNet suggest that we should fine-tune the pruned subnetworks to leverage the information from the large model, instead of re-training from new random initialization as suggested in \citet{liu2018rethinking}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题