增强动态神经网络

论文标题

增强动态神经网络

Boosted Dynamic Neural Networks

论文作者

Yu, Haichao, Li, Haoxiang, Hua, Gang, Huang, Gao, Shi, Humphrey

论文摘要

最近已经广泛研究了一种动态神经网络的早期动态神经网络（EDNN）。典型的EDNN在网络主干的不同层上具有多个预测头。在推断期间，该模型将在最后一个预测头或中间预测头处退出，其中预测置信度高于预定义阈值。为了优化模型，这些预测头与网络主链一起对每批培训数据进行了训练。这带来了火车测试不匹配的问题，所有预测头在训练阶段的所有类型的数据中都进行了优化，而更深的头部只会在测试阶段看到困难的输入。在两个阶段对培训和测试输入的处理不同将导致培训和测试数据分布之间的不匹配。为了减轻此问题，我们将EDNN制定为受梯度提升启发的添加剂模型，并提出了多种训练技术以有效地优化该模型。我们命名我们的方法boostnet。我们的实验表明，它在任何时间和预算批量的预测模式下都达到了CIFAR100和Imagenet数据集上的最新性能。我们的代码在https://github.com/shi-labs/boosted-dynamic-networks上发布。

Early-exiting dynamic neural networks (EDNN), as one type of dynamic neural networks, has been widely studied recently. A typical EDNN has multiple prediction heads at different layers of the network backbone. During inference, the model will exit at either the last prediction head or an intermediate prediction head where the prediction confidence is higher than a predefined threshold. To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data. This brings a train-test mismatch problem that all the prediction heads are optimized on all types of data in training phase while the deeper heads will only see difficult inputs in testing phase. Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions. To mitigate this problem, we formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively. We name our method BoostNet. Our experiments show it achieves the state-of-the-art performance on CIFAR100 and ImageNet datasets in both anytime and budgeted-batch prediction modes. Our code is released at https://github.com/SHI-Labs/Boosted-Dynamic-Networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题