通过识别自我注意的内核结构来赋予参数有效的转移学习

论文标题

通过识别自我注意的内核结构来赋予参数有效的转移学习

Empowering parameter-efficient transfer learning by recognizing the kernel structure in self-attention

论文作者

Chen, Yifan, Hazarika, Devamanyu, Namazifar, Mahdi, Liu, Yang, Jin, Di, Hakkani-Tur, Dilek

论文摘要

预训练的语言模型（PLM）中大量可训练的参数使它们很难部署到多个下游任务。为了解决这个问题，已经提出了参数有效的传输学习方法，以在微调过程中仅调整少数参数，同时冻结其余的参数。本文通过\ textIt {kernel镜头}沿着这一行的现有方法探讨了现有方法。由基于变压器的PLM和内核学习中的自我注意力的联系，我们提出\ textit {kernel-wise适配器}，即\ textit {kernel-mix}，该{kernel-mix}利用自我主张中的核结构来指导可调式参数的分配。这些适配器使用经典内核学习中的指南，并为每个注意力头启用单独的参数调整。我们对各种自然语言生成和理解任务的经验结果表明，我们提出的适配器可以达到或改善现有基线的强劲表现。

The massive amount of trainable parameters in the pre-trained language models (PLMs) makes them hard to be deployed to multiple downstream tasks. To address this issue, parameter-efficient transfer learning methods have been proposed to tune only a few parameters during fine-tuning while freezing the rest. This paper looks at existing methods along this line through the \textit{kernel lens}. Motivated by the connection between self-attention in transformer-based PLMs and kernel learning, we propose \textit{kernel-wise adapters}, namely \textit{Kernel-mix}, that utilize the kernel structure in self-attention to guide the assignment of the tunable parameters. These adapters use guidelines found in classical kernel learning and enable separate parameter tuning for each attention head. Our empirical results, over a diverse set of natural language generation and understanding tasks, show that our proposed adapters can attain or improve the strong performance of existing baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题