论文标题
嵌入语言模型的回收利用
Embedding Recycling for Language Models
论文作者
论文摘要
神经语言模型的现实应用程序通常涉及在同一语料库上运行许多不同的模型。这些运行的高计算成本引起了人们对可以重复使用以前运行中产生的上下文化嵌入的技术的兴趣,以加快未来训练和推断。我们将这种方法称为嵌入回收(ER)。尽管已经提出了多种ER技术,但它们的实际有效性仍然未知,因为现有的评估考虑了很少的模型,并且不能充分考虑间接费用。我们对八个不同型号(17至9亿参数)和14个英语任务进行了广泛的ER评估。我们展示了一种简单的ER技术如何缓存从预算模型的中间层中激活,并在后来的层上学习特定于任务的适配器是广泛有效的。对于我们的实验中表现最佳的基线(Deberta-V2 XL),添加预先计算的缓存导致训练期间的速度> 90%,推理的速度为87-91%,对准确性的影响可忽略不计。我们的分析揭示了未来工作的重要领域。
Real-world applications of neural language models often involve running many different models over the same corpus. The high computational cost of these runs has led to interest in techniques that can reuse the contextualized embeddings produced in previous runs to speed training and inference of future ones. We refer to this approach as embedding recycling (ER). While multiple ER techniques have been proposed, their practical effectiveness is still unknown because existing evaluations consider very few models and do not adequately account for overhead costs. We perform an extensive evaluation of ER across eight different models (17 to 900 million parameters) and fourteen tasks in English. We show how a simple ER technique that caches activations from an intermediate layer of a pretrained model, and learns task-specific adapters on the later layers, is broadly effective. For the best-performing baseline in our experiments (DeBERTa-v2 XL), adding a precomputed cache results in a >90% speedup during training and 87-91% speedup for inference, with negligible impact on accuracy. Our analysis reveals important areas of future work.