您需要更快的关注：通过双重调节注意力冷凝器的快速自我发项神经网络骨干架构

论文标题

您需要更快的关注：通过双重调节注意力冷凝器的快速自我发项神经网络骨干架构

Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers

论文作者

Wong, Alexander, Shafiee, Mohammad Javad, Abbasi, Saad, Nair, Saeejith, Famouri, Mahmoud

论文摘要

随着越来越多的深度学习对在设备上的Tinyml应用程序中的采用，人们对对边缘进行了优化的有效神经网络骨架的需求不断增加。最近，注意力冷凝器网络的引入导致低英寸，高效，自我发场的神经网络，在准确性和速度之间取得了强大的平衡。在这项研究中，我们介绍了一种更快的注意力冷凝器设计，称为双重敏感性冷凝器，可允许高度冷凝的特征嵌入。我们进一步采用了机器驱动的设计探索策略，该策略基于最佳实践施加设计约束，以提高效率和鲁棒性，以产生骨干的宏观微体系结构结构。 The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor when compared to several other state-of-the-art efficient backbones (>10x faster than FB-Net C at higher accuracy and speed and >10x faster than MobileOne-S1 at smaller size) while having a small model size (>1.37x smaller than MobileNetv3-L at higher accuracy and speed) and strong accuracy (1.1% higher在更高速度下，ImageNet上的移动设备XS比MobileVit XS的TOP-1精度）。这些有希望的结果表明，探索不同有效的体系结构设计和自我注意力的机制可以为Tinyml应用带来有趣的新构建块。

With the growing adoption of deep learning for on-device TinyML applications, there has been an ever-increasing demand for efficient neural network backbones optimized for the edge. Recently, the introduction of attention condenser networks have resulted in low-footprint, highly-efficient, self-attention neural networks that strike a strong balance between accuracy and speed. In this study, we introduce a faster attention condenser design called double-condensing attention condensers that allow for highly condensed feature embeddings. We further employ a machine-driven design exploration strategy that imposes design constraints based on best practices for greater efficiency and robustness to produce the macro-micro architecture constructs of the backbone. The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor when compared to several other state-of-the-art efficient backbones (>10x faster than FB-Net C at higher accuracy and speed and >10x faster than MobileOne-S1 at smaller size) while having a small model size (>1.37x smaller than MobileNetv3-L at higher accuracy and speed) and strong accuracy (1.1% higher top-1 accuracy than MobileViT XS on ImageNet at higher speed). These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题