论文标题

事实:对视觉变压器轻巧适应的因素调整

FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer

论文作者

Jie, Shibo, Deng, Zhi-Hong

论文摘要

最近的工作探索了通过仅更新少数参数以提高存储效率(称为参数有效传输学习(PETL))来调整预训练的视觉变压器(VIT)的潜力。当前的PETL方法表明,通过仅调整0.5%的参数,VIT可以适应比完整调整的下游任务,具有更好的性能。在本文中,我们旨在进一步提高PETL的效率,以满足现实世界应用中极端存储的限制。为此,我们提出了一个张力分解框架来存储重量增量,其中每个VIT的权重量化为单个3D张量,然后将其增量分解为轻量级因子。在微调过程中,仅需要更新和存储这些因素,称为因素调节(事实)。在VTAB-1K基准测试中,我们的方法与最先进的PETL方法诺亚(Noah)执行,而参数效率高5倍。我们还提出了一个小型版本,该版本仅使用可训练参数的8K(占VIT参数的0.01%),但表现优于完整的微调和许多其他PETL方法,例如VPT和BITFIT。在几次射击设置中,事实还使用最少的参数击败了所有PETL基准,这表明了其在低数据表中的强大能力。

Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating only a few parameters so as to improve storage efficiency, called parameter-efficient transfer learning (PETL). Current PETL methods have shown that by tuning only 0.5% of the parameters, ViT can be adapted to downstream tasks with even better performance than full fine-tuning. In this paper, we aim to further promote the efficiency of PETL to meet the extreme storage constraint in real-world applications. To this end, we propose a tensorization-decomposition framework to store the weight increments, in which the weights of each ViT are tensorized into a single 3D tensor, and their increments are then decomposed into lightweight factors. In the fine-tuning process, only the factors need to be updated and stored, termed Factor-Tuning (FacT). On VTAB-1K benchmark, our method performs on par with NOAH, the state-of-the-art PETL method, while being 5x more parameter-efficient. We also present a tiny version that only uses 8K (0.01% of ViT's parameters) trainable parameters but outperforms full fine-tuning and many other PETL methods such as VPT and BitFit. In few-shot settings, FacT also beats all PETL baselines using the fewest parameters, demonstrating its strong capability in the low-data regime.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源