论文标题

伯特失去耐心:快速而强大的推断与早期出口

BERT Loses Patience: Fast and Robust Inference with Early Exit

论文作者

Zhou, Wangchunshu, Xu, Canwen, Ge, Tao, McAuley, Julian, Xu, Ke, Wei, Furu

论文摘要

在本文中,我们提出了基于耐心的早期出口,这是一种直接而有效的推理方法,可以用作插件技术,以同时提高审计语言模型(PLM)的效率和鲁棒性。为了实现这一目标,我们的方法将内部分类器与PLM的每一层融合在一起,并在内部分类器的中间预测保持不变的步骤时动态停止推断。我们的方法提高了推理效率,因为它允许模型以更少的层次进行预测。同时,通过Albert模型的实验结果表明,我们的方法可以通过防止其过度思考和利用多个分类器进行预测来提高模型的准确性和鲁棒性,从而与现有的早期退出方法相比,获得了更好的准确性速度权衡。

In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model (PLM). To achieve this, our approach couples an internal-classifier with each layer of a PLM and dynamically stops inference when the intermediate predictions of the internal classifiers remain unchanged for a pre-defined number of steps. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers. Meanwhile, experimental results with an ALBERT model show that our method can improve the accuracy and robustness of the model by preventing it from overthinking and exploiting multiple classifiers for prediction, yielding a better accuracy-speed trade-off compared to existing early exit methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源