论文标题
视觉变压器的参数效率模型改编
Parameter-efficient Model Adaptation for Vision Transformers
论文作者
论文摘要
在计算机视觉中,它通过调整大规模预处理的视觉模型(例如,视觉变形金刚)来实现了出色的转移学习表现。模型适应的常见方法要么更新所有模型参数或利用线性探针。在本文中,我们旨在研究图像分类任务上视觉变压器的参数有效模型适应策略。我们将有效的模型适应为子空间培训问题,并对不同有效的适应方法进行全面的基准测试。我们对每种有效的模型适应方法进行了一项实证研究,该方法侧重于其性能以及参数成本。此外,我们提出了一个参数有效的模型适应框架,该框架首先通过测量局部固有维度来选择子模型,然后通过新颖的Kronecker适应(KADAPTATION)方法将其投射到子空间中,以进一步分解。我们将我们的方法与多种基线模型适应方法进行分析和比较(包括预审前语言模型的最新方法)。我们的方法在几个图像设置下的20个图像分类数据集和在完整摄影设置下的20个图像分类数据集中的准确性和参数效率之间的权衡方面表现最好。
In computer vision, it has achieved great transfer learning performance via adapting large-scale pretrained vision models (e.g., vision transformers) to downstream tasks. Common approaches for model adaptation either update all model parameters or leverage linear probes. In this paper, we aim to study parameter-efficient model adaptation strategies for vision transformers on the image classification task. We formulate efficient model adaptation as a subspace training problem and perform a comprehensive benchmarking over different efficient adaptation methods. We conduct an empirical study on each efficient model adaptation method focusing on its performance alongside parameter cost. Furthermore, we propose a parameter-efficient model adaptation framework, which first selects submodules by measuring local intrinsic dimensions and then projects them into subspace for further decomposition via a novel Kronecker Adaptation (KAdaptation) method. We analyze and compare our method with a diverse set of baseline model adaptation methods (including state-of-the-art methods for pretrained language models). Our method performs the best in terms of the tradeoff between accuracy and parameter efficiency across 20 image classification datasets under the few-shot setting and 7 image classification datasets under the full-shot setting.