辩护核心设定：一种用于主动学习的密度感知核心选择

论文标题

辩护核心设定：一种用于主动学习的密度感知核心选择

In Defense of Core-set: A Density-aware Core-set Selection for Active Learning

论文作者

Kim, Yeachan, Shin, Bonggun

论文摘要

主动学习通过从未标记的数据集中标记信息的样本来有效地构造标记的数据集。在现实世界中的活跃学习方案中，考虑到所选样本的多样性至关重要，因为存在许多冗余或高度相似的样本。核心设定方法是基于多样性的有希望的方法，根据样品之间的距离选择各种样本。然而，与选择最困难的样本的基于不确定性的方法相比，该方法的性能差，神经模型表现出较低的置信度。在这项工作中，我们通过密度的镜头分析特征空间，有趣的是，观察到局部稀疏区域往往比密集区域更有用的样品。在我们的分析中，我们将核心设定方法赋予密度意识，并提出密度感知的核心集（DACS）。该策略是估计未标记样品的密度，并主要从稀疏区域选择不同的样品。为了减少估计密度的计算瓶颈，我们还基于对区域敏感的散列引入了新的密度近似。实验结果清楚地表明了DAC在分类和回归任务中的功效，并特别表明DAC可以在实际情况下产生最先进的性能。由于DACS弱取决于神经体系结构，因此我们提出了一种简单而有效的组合方法，以表明现有方法可以与DAC合并。

Active learning enables the efficient construction of a labeled dataset by labeling informative samples from an unlabeled dataset. In a real-world active learning scenario, considering the diversity of the selected samples is crucial because many redundant or highly similar samples exist. Core-set approach is the promising diversity-based method selecting diverse samples based on the distance between samples. However, the approach poorly performs compared to the uncertainty-based approaches that select the most difficult samples where neural models reveal low confidence. In this work, we analyze the feature space through the lens of the density and, interestingly, observe that locally sparse regions tend to have more informative samples than dense regions. Motivated by our analysis, we empower the core-set approach with the density-awareness and propose a density-aware core-set (DACS). The strategy is to estimate the density of the unlabeled samples and select diverse samples mainly from sparse regions. To reduce the computational bottlenecks in estimating the density, we also introduce a new density approximation based on locality-sensitive hashing. Experimental results clearly demonstrate the efficacy of DACS in both classification and regression tasks and specifically show that DACS can produce state-of-the-art performance in a practical scenario. Since DACS is weakly dependent on neural architectures, we present a simple yet effective combination method to show that the existing methods can be beneficially combined with DACS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题