MixMask：重新审视暹罗弯曲的掩蔽策略

论文标题

MixMask：重新审视暹罗弯曲的掩蔽策略

MixMask: Revisiting Masking Strategy for Siamese ConvNets

论文作者

Vishniakov, Kirill, Xing, Eric, Shen, Zhiqiang

论文摘要

自我监督学习的最新进展已成功地将蒙面图像建模（MIM）与暹罗网络相结合，从而利用了两种方法的优势。尽管如此，在将传统的基于擦除的掩蔽整合在暹罗河流中时，某些挑战仍然存在。两个主要问题是：（1）convnets的连续数据处理性质，这不允许排除非信息性掩盖区域，与VIT结构相比，训练效率降低了；（2）基于擦除的掩蔽与基于对比的目标之间的错位，将其与MIM技术区分开。为了应对这些挑战，这项工作介绍了一种新颖的基于填充的掩蔽方法，称为\ textbf {mixMask}。所提出的方法用不同图像的内容代替了擦除的区域，从而有效地抵消了传统掩蔽方法中看到的信息耗竭。此外，我们推出了一个自适应损失函数，该功能捕获了新修补的视图的语义，从而确保了体系结构框架中的无缝集成。我们通过跨各种数据集和应用程序方案的全面实验来验证方法的有效性。这些发现强调了我们框架在线性探测，半监督和监督的登录，对象检测和分割等领域的增强性能。值得注意的是，我们的方法超过了MSCN，将Mixmask建立为暹罗交响乐的更有利的掩蔽解决方案。我们的代码和模型可在https://github.com/kirill-vish/mixmask上公开获取。

The recent progress in self-supervised learning has successfully combined Masked Image Modeling (MIM) with Siamese Networks, harnessing the strengths of both methodologies. Nonetheless, certain challenges persist when integrating conventional erase-based masking within Siamese ConvNets. Two primary concerns are: (1) The continuous data processing nature of ConvNets, which doesn't allow for the exclusion of non-informative masked regions, leading to reduced training efficiency compared to ViT architecture; (2) The misalignment between erase-based masking and the contrastive-based objective, distinguishing it from the MIM technique. To address these challenges, this work introduces a novel filling-based masking approach, termed \textbf{MixMask}. The proposed method replaces erased areas with content from a different image, effectively countering the information depletion seen in traditional masking methods. Additionally, we unveil an adaptive loss function that captures the semantics of the newly patched views, ensuring seamless integration within the architectural framework. We empirically validate the effectiveness of our approach through comprehensive experiments across various datasets and application scenarios. The findings underscore our framework's enhanced performance in areas such as linear probing, semi-supervised and supervised finetuning, object detection and segmentation. Notably, our method surpasses the MSCN, establishing MixMask as a more advantageous masking solution for Siamese ConvNets. Our code and models are publicly available at https://github.com/kirill-vish/MixMask.

下载PDF全文

下载文献需遵守相关版权规定

论文标题