论文标题
fix-a-Step:从未修剪的未标记数据中进行半监督的学习
Fix-A-Step: Semi-supervised Learning from Uncurated Unlabeled Data
论文作者
论文摘要
半监督学习(SSL)通过对许多未标记的图像进行培训,与小标签数据集中的培训分类器相比,与训练分类器相比,准确性提高了。在医学成像之类的实际应用中,将收集未标记的数据以进行权宜,因此未考虑:可能与类或功能中的标记设置不同。不幸的是,当给出未进行未经标记的数据时,现代的深SSL通常会使准确性变得更糟。最近的复杂补救措施试图检测到分布未标记的图像,然后将其丢弃或减轻重量。取而代之的是,我们介绍了Fix-a-step,这是一个更简单的过程,将所有未经贴标记的图像视为潜在有用的。我们的第一个见解是,即使是未经切割的图像也可以产生有用的标记数据的增强。其次,我们修改了梯度下降更新,以防止优化多任务SSL损失,从而损害了标签集的精度。 Fix-A-Step可以修复许多常见的深SSL方法,从而提高了所有测试方法和人工类不匹配水平的CIFAR基准测试的准确性。在一种名为Heart2Heart的新型医疗SSL基准测试中,Fix-a-Step可以从353,500个真正未经经过的超声图像中学习,以提供跨医院概括的收益。
Semi-supervised learning (SSL) promises improved accuracy compared to training classifiers on small labeled datasets by also training on many unlabeled images. In real applications like medical imaging, unlabeled data will be collected for expediency and thus uncurated: possibly different from the labeled set in classes or features. Unfortunately, modern deep SSL often makes accuracy worse when given uncurated unlabeled data. Recent complex remedies try to detect out-of-distribution unlabeled images and then discard or downweight them. Instead, we introduce Fix-A-Step, a simpler procedure that views all uncurated unlabeled images as potentially helpful. Our first insight is that even uncurated images can yield useful augmentations of labeled data. Second, we modify gradient descent updates to prevent optimizing a multi-task SSL loss from hurting labeled-set accuracy. Fix-A-Step can repair many common deep SSL methods, improving accuracy on CIFAR benchmarks across all tested methods and levels of artificial class mismatch. On a new medical SSL benchmark called Heart2Heart, Fix-A-Step can learn from 353,500 truly uncurated ultrasound images to deliver gains that generalize across hospitals.