重要性重新授权，以便双重学习

论文标题

重要性重新授权，以便双重学习

Importance Reweighting for Biquality Learning

论文作者

Nodet, Pierre, Lemaire, Vincent, Bondu, Alexis, Cornuéjols, Antoine

论文摘要

弱监督学习（WSL）的领域最近看到了流行的激增，许多论文涉及不同类型的“监督缺陷”，即：质量差，不适应性和不足的标签。关于质量，标签噪声可能具有不同类型的噪声，包括完全按随机，随机甚至不是随机。所有这些标签噪声在文献中分别解决，导致高度专业化的方法。本文提出了一种原始的，包括弱监督学习的视图，这导致了能够处理任何类型的标签噪声的通用方法的设计。为此，使用了一种称为“生物数据”的替代设置。它假设除了不受信任的嘈杂示例的数据集外，还可以使用一个较小的信任示例的少量数据集。在本文中，我们提出了一种新的重新启动方案，能够在未经信任的数据集中识别未腐败的示例。这允许一个人使用两个数据集学习分类器。模拟几种标签噪声并改变不信任示例的质量和数量的广泛实验表明，所提出的方法的表现优于基准和最先进的方法。

The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of "supervision deficiencies", namely: poor quality, non adaptability, and insufficient quantity of labels. Regarding quality, label noise can be of different types, including completely-at-random, at-random or even not-at-random. All these kinds of label noise are addressed separately in the literature, leading to highly specialized approaches. This paper proposes an original, encompassing, view of Weakly Supervised Learning, which results in the design of generic approaches capable of dealing with any kind of label noise. For this purpose, an alternative setting called "Biquality data" is used. It assumes that a small trusted dataset of correctly labeled examples is available, in addition to an untrusted dataset of noisy examples. In this paper, we propose a new reweigthing scheme capable of identifying noncorrupted examples in the untrusted dataset. This allows one to learn classifiers using both datasets. Extensive experiments that simulate several types of label noise and that vary the quality and quantity of untrusted examples, demonstrate that the proposed approach outperforms baselines and state-of-the-art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题