论文标题
通过在边距下加权区域结合众包标签的歧义任务
Identify ambiguous tasks combining crowdsourced labels by weighting Areas Under the Margin
论文作者
论文摘要
在监督学习中 - 例如,在图像分类中 - 现代大型数据集通常由一群工人标记。然后将在此众包环境中获得的标签进行汇总进行培训,通常利用每工业人的信任得分。然而,这些以工人为导向的方法丢弃了任务的歧义。模棱两可的任务可能会欺骗专家工人,这通常对学习步骤有害。在标准监督的学习设置中 - 每个任务都有一个标签 - 量身定制了边缘(AUM)下的区域,以识别标签错误的数据。我们适应了AUM,以确定众包学习场景中的模棱两可的任务,从而引入了边距(WAUM)下的加权区域。 WAUM是根据任务依赖性分数加权的AUM的平均值。我们表明,WAUM可以帮助从训练集中删除模棱两可的任务,从而提高概括性能。我们在模拟设置和诸如CIFAR-10H(众包数据集中具有大量答案标签的众包数据集),标签和音乐(两个数据集(两个具有很少的回答票)的数据集(一个数据集)上,我们报告了对人群学习现有策略的改进。
In supervised learning - for instance in image classification - modern massive datasets are commonly labeled by a crowd of workers. The obtained labels in this crowdsourcing setting are then aggregated for training, generally leveraging a per-worker trust score. Yet, such workers oriented approaches discard the tasks' ambiguity. Ambiguous tasks might fool expert workers, which is often harmful for the learning step. In standard supervised learning settings - with one label per task - the Area Under the Margin (AUM) was tailored to identify mislabeled data. We adapt the AUM to identify ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted Areas Under the Margin (WAUM). The WAUM is an average of AUMs weighted according to task-dependent scores. We show that the WAUM can help discarding ambiguous tasks from the training set, leading to better generalization performance. We report improvements over existing strategies for learning with a crowd, both on simulated settings, and on real datasets such as CIFAR-10H (a crowdsourced dataset with a high number of answered labels),LabelMe and Music (two datasets with few answered votes).