提炼最佳神经网络：在不同空间中快速搜索

论文标题

提炼最佳神经网络：在不同空间中快速搜索

Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces

论文作者

Moons, Bert, Noorzad, Parham, Skliar, Andrii, Mariani, Giovanni, Mehta, Dushyant, Lott, Chris, Blankevoort, Tijmen

论文摘要

当前的最新神经体系结构搜索（NAS）方法既没有有效地扩展到多个硬件平台，也不能处理各种架构搜索空间。为了解决这个问题，我们提出了Donna（蒸馏出最佳神经网络体系结构），这是一种用于快速，可扩展和多样化NA的新型管道，可扩展到许多用户场景。唐娜由三个阶段组成。首先，使用参考模型的块知识蒸馏构建精度预测变量。该预测器可以通过不同的网络进行搜索，该网络具有不同的宏观构造参数，例如层类型和注意力机制，以及跨微构造参数（例如块重复和扩展率）。其次，快速进化搜索使用精确性预测指标和设备测量值，为任何方案找到了一组帕累托最佳体系结构。第三，最佳模型很快被列出，以训练从划痕准确性训练。在寻找最先进的体系结构时，唐娜的速度比MNASNET快100倍。分类成像网，DONNA体系结构比NVIDIA V100 GPU上的EfficityNet-B0快20％，而MobileNetV2的速度比Samsung S20智能手机上的Mobilenetv2-1.4x快10％，准确性比Mobilenetv2-1.4x高0.5％。除NAS外，Donna还用于搜索空间扩展和探索以及硬件感知模型压缩。

Current state-of-the-art Neural Architecture Search (NAS) methods neither efficiently scale to multiple hardware platforms, nor handle diverse architectural search-spaces. To remedy this, we present DONNA (Distilling Optimal Neural Network Architectures), a novel pipeline for rapid, scalable and diverse NAS, that scales to many user scenarios. DONNA consists of three phases. First, an accuracy predictor is built using blockwise knowledge distillation from a reference model. This predictor enables searching across diverse networks with varying macro-architectural parameters such as layer types and attention mechanisms, as well as across micro-architectural parameters such as block repeats and expansion rates. Second, a rapid evolutionary search finds a set of pareto-optimal architectures for any scenario using the accuracy predictor and on-device measurements. Third, optimal models are quickly finetuned to training-from-scratch accuracy. DONNA is up to 100x faster than MNasNet in finding state-of-the-art architectures on-device. Classifying ImageNet, DONNA architectures are 20% faster than EfficientNet-B0 and MobileNetV2 on a Nvidia V100 GPU and 10% faster with 0.5% higher accuracy than MobileNetV2-1.4x on a Samsung S20 smartphone. In addition to NAS, DONNA is used for search-space extension and exploration, as well as hardware-aware model compression.

下载PDF全文

下载文献需遵守相关版权规定

论文标题