事实：对视觉变压器轻巧适应的因素调整

论文标题

事实：对视觉变压器轻巧适应的因素调整

FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer

论文作者

Jie, Shibo, Deng, Zhi-Hong

论文摘要

最近的工作探索了通过仅更新少数参数以提高存储效率（称为参数有效传输学习（PETL））来调整预训练的视觉变压器（VIT）的潜力。当前的PETL方法表明，通过仅调整0.5％的参数，VIT可以适应比完整调整的下游任务，具有更好的性能。在本文中，我们旨在进一步提高PETL的效率，以满足现实世界应用中极端存储的限制。为此，我们提出了一个张力分解框架来存储重量增量，其中每个VIT的权重量化为单个3D张量，然后将其增量分解为轻量级因子。在微调过程中，仅需要更新和存储这些因素，称为因素调节（事实）。在VTAB-1K基准测试中，我们的方法与最先进的PETL方法诺亚（Noah）执行，而参数效率高5倍。我们还提出了一个小型版本，该版本仅使用可训练参数的8K（占VIT参数的0.01％），但表现优于完整的微调和许多其他PETL方法，例如VPT和BITFIT。在几次射击设置中，事实还使用最少的参数击败了所有PETL基准，这表明了其在低数据表中的强大能力。

Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating only a few parameters so as to improve storage efficiency, called parameter-efficient transfer learning (PETL). Current PETL methods have shown that by tuning only 0.5% of the parameters, ViT can be adapted to downstream tasks with even better performance than full fine-tuning. In this paper, we aim to further promote the efficiency of PETL to meet the extreme storage constraint in real-world applications. To this end, we propose a tensorization-decomposition framework to store the weight increments, in which the weights of each ViT are tensorized into a single 3D tensor, and their increments are then decomposed into lightweight factors. In the fine-tuning process, only the factors need to be updated and stored, termed Factor-Tuning (FacT). On VTAB-1K benchmark, our method performs on par with NOAH, the state-of-the-art PETL method, while being 5x more parameter-efficient. We also present a tiny version that only uses 8K (0.01% of ViT's parameters) trainable parameters but outperforms full fine-tuning and many other PETL methods such as VPT and BitFit. In few-shot settings, FacT also beats all PETL baselines using the fewest parameters, demonstrating its strong capability in the low-data regime.

下载PDF全文

下载文献需遵守相关版权规定

论文标题