交叉插图：通过数据拆分来减轻标签噪声记忆

论文标题

交叉插图：通过数据拆分来减轻标签噪声记忆

CrossSplit: Mitigating Label Noise Memorization through Data Splitting

论文作者

Kim, Jihye, Baratin, Aristide, Zhang, Yan, Lacoste-Julien, Simon

论文摘要

我们解决了在存在标签噪声的情况下改善深度学习算法鲁棒性的问题。在现有的标签校正和共同教学方法的基础上，我们提出了一种新型的培训程序，以减轻称为Crosssplit的嘈杂标签的记忆，该标签使用了一对在标记数据集的两个不相交部分训练的神经网络。交叉拼图结合了两种主要成分：（i）跨切割标签校正。这个想法是，由于在数据的一个部分上训练的模型无法记住另一部分的示例标签对，因此可以通过使用其同行网络的预测来平稳调整给每个网络的训练标签；（ii）跨分类半监督训练。对数据的一部分进行训练的网络还使用了另一部分的未标记输入。关于CIFAR-10，CIFAR-100，Tiny-Imagenet和Mini-Webvision数据集的广泛实验表明，我们的方法可以以广泛的噪声比以优于最新的最新噪声。

We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label correction and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the labelled dataset. CrossSplit combines two main ingredients: (i) Cross-split label correction. The idea is that, since the model trained on one part of the data cannot memorize example-label pairs from the other part, the training labels presented to each network can be smoothly adjusted by using the predictions of its peer network; (ii) Cross-split semi-supervised training. A network trained on one part of the data also uses the unlabeled inputs of the other part. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题