推动自我监督的重置的限制：我们可以在没有标签上的标签上胜过监督的学习吗？

论文标题

推动自我监督的重置的限制：我们可以在没有标签上的标签上胜过监督的学习吗？

Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

论文作者

Tomasev, Nenad, Bica, Ioana, McWilliams, Brian, Buesing, Lars, Pascanu, Razvan, Blundell, Charles, Mitrovic, Jovana

论文摘要

尽管最近通过剩余网络在表示学习方面的自我监督方法取得了进展，但它们仍然不足以监督成像网分类基准的监督学习，从而限制了其在绩效至关重要的环境中的适用性。在遗物[Mitrovic等，2021]的先前理论见解的基础上，我们将额外的归纳偏见包括在自我监督的学习中。我们提出了一种新的自我监督的表示方法Relicv2，该方法将明确的不变性损失与对比度目标相结合，而不是多种适当构建的数据视图，以避免学习虚假的相关性并获得更有信息的表示。 RELICV2在Resnet50上线性评估下的ImageNet上的Imagenet上获得$ 77.1 \％$ $ 1 $的精度，从而将以前的最新面积提高到了绝对的$+1.5 \％$;在较大的Resnet型号上，Relicv2的实现最高$ 80.6 \％$优于先前的自我监督方法，利润率高达$+2.3 \％$。最值得注意的是，RelicV2是第一种无监督的表示方法，可以在一系列重置体系结构的类似比较中始终超过监督基线。使用RelicV2，我们还学习了比以前的图像分类和语义分割的更高且可转移的表示，这些表示可以更好地概括分布。最后，我们表明，尽管使用了重新编码器，但RelicV2与最新的自我监视视力变压器相媲美。

Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from ReLIC [Mitrovic et al., 2021], we include additional inductive biases into self-supervised learning. We propose a new self-supervised representation learning method, ReLICv2, which combines an explicit invariance loss with a contrastive objective over a varied set of appropriately constructed data views to avoid learning spurious correlations and obtain more informative representations. ReLICv2 achieves $77.1\%$ top-$1$ accuracy on ImageNet under linear evaluation on a ResNet50, thus improving the previous state-of-the-art by absolute $+1.5\%$; on larger ResNet models, ReLICv2 achieves up to $80.6\%$ outperforming previous self-supervised approaches with margins up to $+2.3\%$. Most notably, ReLICv2 is the first unsupervised representation learning method to consistently outperform the supervised baseline in a like-for-like comparison over a range of ResNet architectures. Using ReLICv2, we also learn more robust and transferable representations that generalize better out-of-distribution than previous work, both on image classification and semantic segmentation. Finally, we show that despite using ResNet encoders, ReLICv2 is comparable to state-of-the-art self-supervised vision transformers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题