$ \ MATHCAL {y} $ - 调整：通过标签表示学习的大规模预训练模型的有效调整范例

论文标题

$ \ MATHCAL {y} $ - 调整：通过标签表示学习的大规模预训练模型的有效调整范例

$\mathcal{Y}$-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning

论文作者

Liu, Yitao, An, Chenxin, Qiu, Xipeng

论文摘要

随着大规模预训练模型（PTM）的成功，如何有效地适应了PTMS到下游任务，引起了极大的关注，尤其是对于具有数十亿个参数的PTM。尽管已经提出了一些参数有效的调整范例来解决此问题，但他们仍然需要大量资源来计算训练阶段的梯度。在本文中，我们建议$ \ Mathcal {y} $ - 调整，一种有效但有效的范式，可将冷冻的大规模PTMS调整为特定的下游任务。 $ \ MATHCAL {y} $ - 调整$ \ MATHCAL {y} $在给定任务中定义的密集表示，并将它们对齐为固定功能表示。在不调整输入文本和模型参数的功能的情况下，$ \ mathcal {y} $ - 调整既是参数效率高效率又是训练效率。对于$ \ text {deberta} _ \ text {xxl} $，带有16亿个参数，$ \ mathcal {y} $ - 调整性能超过$ 96 \％的$ 96 \％$ $ 96 \％的glue基准中的$ 2 \％$ 2 \％$调谐参数和少量培训成本。

With the success of large-scale pre-trained models (PTMs), how efficiently adapting PTMs to downstream tasks has attracted tremendous attention, especially for PTMs with billions of parameters. Although some parameter-efficient tuning paradigms have been proposed to address this problem, they still require large resources to compute the gradients in the training phase. In this paper, we propose $\mathcal{Y}$-Tuning, an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks. $\mathcal{Y}$-tuning learns dense representations for labels $\mathcal{Y}$ defined in a given task and aligns them to fixed feature representation. Without tuning the features of input text and model parameters, $\mathcal{Y}$-tuning is both parameter-efficient and training-efficient. For $\text{DeBERTa}_\text{XXL}$ with 1.6 billion parameters, $\mathcal{Y}$-tuning achieves performance more than $96\%$ of full fine-tuning on GLUE Benchmark with only $2\%$ tunable parameters and much fewer training costs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题