论文标题
您需要更快的关注:通过双重调节注意力冷凝器的快速自我发项神经网络骨干架构
Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers
论文作者
论文摘要
随着越来越多的深度学习对在设备上的Tinyml应用程序中的采用,人们对对边缘进行了优化的有效神经网络骨架的需求不断增加。最近,注意力冷凝器网络的引入导致低英寸,高效,自我发场的神经网络,在准确性和速度之间取得了强大的平衡。在这项研究中,我们介绍了一种更快的注意力冷凝器设计,称为双重敏感性冷凝器,可允许高度冷凝的特征嵌入。我们进一步采用了机器驱动的设计探索策略,该策略基于最佳实践施加设计约束,以提高效率和鲁棒性,以产生骨干的宏观微体系结构结构。 The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor when compared to several other state-of-the-art efficient backbones (>10x faster than FB-Net C at higher accuracy and speed and >10x faster than MobileOne-S1 at smaller size) while having a small model size (>1.37x smaller than MobileNetv3-L at higher accuracy and speed) and strong accuracy (1.1% higher在更高速度下,ImageNet上的移动设备XS比MobileVit XS的TOP-1精度)。这些有希望的结果表明,探索不同有效的体系结构设计和自我注意力的机制可以为Tinyml应用带来有趣的新构建块。
With the growing adoption of deep learning for on-device TinyML applications, there has been an ever-increasing demand for efficient neural network backbones optimized for the edge. Recently, the introduction of attention condenser networks have resulted in low-footprint, highly-efficient, self-attention neural networks that strike a strong balance between accuracy and speed. In this study, we introduce a faster attention condenser design called double-condensing attention condensers that allow for highly condensed feature embeddings. We further employ a machine-driven design exploration strategy that imposes design constraints based on best practices for greater efficiency and robustness to produce the macro-micro architecture constructs of the backbone. The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor when compared to several other state-of-the-art efficient backbones (>10x faster than FB-Net C at higher accuracy and speed and >10x faster than MobileOne-S1 at smaller size) while having a small model size (>1.37x smaller than MobileNetv3-L at higher accuracy and speed) and strong accuracy (1.1% higher top-1 accuracy than MobileViT XS on ImageNet at higher speed). These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.