Dessilbi：通过差分包含路径探索深网的结构稀疏性

论文标题

Dessilbi：通过差分包含路径探索深网的结构稀疏性

DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths

论文作者

Fu, Yanwei, Liu, Chen, Li, Donghao, Sun, Xinwei, Zeng, Jinshan, Yao, Yuan

论文摘要

如今，过度参数是无处不在的训练神经网络的无处不在，从而有利于在减少预测误差方面寻求全局优化和概括方面的优化。但是，在许多现实世界应用中都需要压缩网络，并且对小型网络的直接培训可能会被困在本地Optima中。在本文中，我们提出了一种基于反向尺度空间的差异夹杂物的新方法，而不是修剪或提取过度参数化的模型。具体而言，它生成了一个模型家族，从简单到复杂的模型，这些模型将一对参数耦合到同时训练过度参数的深层模型和结构稀疏，并在完全连接和卷积层的重量上进行结构稀疏。这种差异包含方案具有简单的离散化，提出是深层结构分裂的线性化布雷格曼迭代（Dessilbi），其在深度学习中的全局收敛分析是从任何初始化的算法迭代中收敛到经验风险的关键点。实验证据表明，与竞争优化者相比，Dessilbi在探索基准数据集中几种广泛使用的骨干的结构稀疏时取得了可比性甚至更好的性能。值得注意的是，随着早期停止，Dessilbi在早期时期发布了“获胜的门票”：有效的稀疏结构，具有可比的测试精度，与完全训练的过度参数化模型。

Over-parameterization is ubiquitous nowadays in training neural networks to benefit both optimization in seeking global optima and generalization in reducing prediction error. However, compressive networks are desired in many real world applications and direct training of small networks may be trapped in local optima. In this paper, instead of pruning or distilling over-parameterized models to compressive ones, we propose a new approach based on differential inclusions of inverse scale spaces. Specifically, it generates a family of models from simple to complex ones that couples a pair of parameters to simultaneously train over-parameterized deep models and structural sparsity on weights of fully connected and convolutional layers. Such a differential inclusion scheme has a simple discretization, proposed as Deep structurally splitting Linearized Bregman Iteration (DessiLBI), whose global convergence analysis in deep learning is established that from any initializations, algorithmic iterations converge to a critical point of empirical risks. Experimental evidence shows that DessiLBI achieve comparable and even better performance than the competitive optimizers in exploring the structural sparsity of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, DessiLBI unveils "winning tickets" in early epochs: the effective sparse structure with comparable test accuracy to fully trained over-parameterized models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题