论文标题
在您的主网上添加一个隔壁
Add a SideNet to your MainNet
论文作者
论文摘要
随着深神经网络的性能和普及,其计算成本也有所提高。有许多有效的技术可以减少网络的计算足迹(定量,修剪,知识蒸馏),但是这些导致其计算成本相同的模型,无论其输入如何。我们的人类反应时间随着我们执行的任务的复杂性而有所不同:更容易的任务(例如,将狗分开)的执行速度要快得多,速度要快得多(例如,分开两种相似的狗。在此观察结果的驱动下,我们通过将一个小的分类层(我们称为SideNet)连接到一个称为Mainnet的大型网络中,开发了一种自适应网络复杂性的方法。给定输入,如果通过SoftMax获得的置信度水平超过了用户确定的阈值,则环球网将返回分类,如果其置信度太低,则仅将其传递到大型内网上以进行进一步处理。这使我们能够通过其计算成本灵活地对网络的性能进行贸易。实验结果表明,简单的单个隐藏层感知到添加到验证的Resnet和Bert Mainnet上,可以大大减少计算,而在图像和文本分类任务上的性能最少下降。我们还强调了我们方法的其他三个理想特性,即校准了SIDENET所获得的分类,与其他计算还原技术互补,并且可以轻松探索计算精度空间。
As the performance and popularity of deep neural networks has increased, so too has their computational cost. There are many effective techniques for reducing a network's computational footprint (quantisation, pruning, knowledge distillation), but these lead to models whose computational cost is the same regardless of their input. Our human reaction times vary with the complexity of the tasks we perform: easier tasks (e.g. telling apart dogs from boat) are executed much faster than harder ones (e.g. telling apart two similar looking breeds of dogs). Driven by this observation, we develop a method for adaptive network complexity by attaching a small classification layer, which we call SideNet, to a large pretrained network, which we call MainNet. Given an input, the SideNet returns a classification if its confidence level, obtained via softmax, surpasses a user determined threshold, and only passes it along to the large MainNet for further processing if its confidence is too low. This allows us to flexibly trade off the network's performance with its computational cost. Experimental results show that simple single hidden layer perceptron SideNets added onto pretrained ResNet and BERT MainNets allow for substantial decreases in compute with minimal drops in performance on image and text classification tasks. We also highlight three other desirable properties of our method, namely that the classifications obtained by SideNets are calibrated, complementary to other compute reduction techniques, and that they enable the easy exploration of compute accuracy space.