通过渐进的重新聚力度调度来实现灵活的感应偏差

论文标题

通过渐进的重新聚力度调度来实现灵活的感应偏差

Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling

论文作者

Lee, Yunsung, Lee, Gyuseong, Ryoo, Kwangrok, Go, Hyojun, Park, Jihye, Kim, Seungryong

论文摘要

最近的计算机视觉中有两个事实上的标准体系结构：卷积神经网络（CNN）和视觉变压器（VITS）。卷积的强诱导偏见有助于该模型有效地学习样本，但是当有足够的数据可用时，这种强偏见也限制了CNN的上限。相反，对于小数据，VIT不如CNN，但对于足够的数据而言，VIT优越。最近的方法试图结合这两个架构的优势。但是，我们表明这些方法忽略了，最佳归纳偏置也会根据目标数据量表的变化而变化，通过比较不同比率下采样图像网的子集的准确性。另外，通过对特征图的傅立叶分析，该模型根据信号频率变化的响应模式，我们观察到哪种电感偏差对于每个数据量表都是有利的。模型中包含越卷积的感应偏置，在类似VIT模型的表现优于重新连接性能的情况下，需要数据量表越小。为了获得具有柔性电感偏置在数据量表上的模型，我们显示重新聚集可以插值卷积和自我注意力之间的电感偏差。通过调整该模型的时期数量，我们表明从卷积到自我发场的重新聚集化会插入CNN和VIT之间的傅立叶分析模式。调整这些发现，我们提出了进行性重新聚集计划（PRS），其中重新聚体化调整了所需的卷积样或自我注意力样的感应性偏置。对于小规模数据集，我们的PRS在晚期层的卷积到自我发场的重新聚集化。 PRS的表现优于先前关于小规模数据集的研究，例如CIFAR-100。

There are two de facto standard architectures in recent computer vision: Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). Strong inductive biases of convolutions help the model learn sample effectively, but such strong biases also limit the upper bound of CNNs when sufficient data are available. On the contrary, ViT is inferior to CNNs for small data but superior for sufficient data. Recent approaches attempt to combine the strengths of these two architectures. However, we show these approaches overlook that the optimal inductive bias also changes according to the target data scale changes by comparing various models' accuracy on subsets of sampled ImageNet at different ratios. In addition, through Fourier analysis of feature maps, the model's response patterns according to signal frequency changes, we observe which inductive bias is advantageous for each data scale. The more convolution-like inductive bias is included in the model, the smaller the data scale is required where the ViT-like model outperforms the ResNet performance. To obtain a model with flexible inductive bias on the data scale, we show reparameterization can interpolate inductive bias between convolution and self-attention. By adjusting the number of epochs the model stays in the convolution, we show that reparameterization from convolution to self-attention interpolates the Fourier analysis pattern between CNNs and ViTs. Adapting these findings, we propose Progressive Reparameterization Scheduling (PRS), in which reparameterization adjusts the required amount of convolution-like or self-attention-like inductive bias per layer. For small-scale datasets, our PRS performs reparameterization from convolution to self-attention linearly faster at the late stage layer. PRS outperformed previous studies on the small-scale dataset, e.g., CIFAR-100.

下载PDF全文

下载文献需遵守相关版权规定

论文标题