LADA：通过增强进行积极学习的数据获取

论文标题

LADA：通过增强进行积极学习的数据获取

LADA: Look-Ahead Data Acquisition via Augmentation for Active Learning

论文作者

Kim, Yoon-Yeong, Song, Kyungwoo, Jang, JoonHo, Moon, Il-Chul

论文摘要

当标记的数据集有限并且注释成本很高时，主动学习有效地收集了培训深度学习模型的数据实例。除了积极的学习外，数据增强也是扩大有限标记实例的有效技术。但是，在积极学习的采集过程中尚未考虑来自数据增强产生的虚拟实例的潜在增益。在获取过程中，展望数据增强的效果将选择并生成用于培训模型的信息的数据实例。因此，本文提出了通过增强或LADA的数据采集，以整合数据采集和数据扩展。 LADA认为两者都有1）未标记的数据实例和2）在收购过程之前，要通过数据增强生成的虚拟数据实例。此外，为了增强虚拟数据实例的信息性，LADA优化了数据增强策略以最大程度地提高预测性采集评分，从而导致了Infomixup和Infostn的提议。由于LADA是一个可概括的框架，因此我们尝试了采集和增强方法的各种组合。 LADA的性能显示出对最近的增强和采集基线的显着改善，这些基线被独立应用于基准数据集。

Active learning effectively collects data instances for training deep learning models when the labeled dataset is limited and the annotation cost is high. Besides active learning, data augmentation is also an effective technique to enlarge the limited amount of labeled instances. However, the potential gain from virtual instances generated by data augmentation has not been considered in the acquisition process of active learning yet. Looking ahead the effect of data augmentation in the process of acquisition would select and generate the data instances that are informative for training the model. Hence, this paper proposes Look-Ahead Data Acquisition via augmentation, or LADA, to integrate data acquisition and data augmentation. LADA considers both 1) unlabeled data instance to be selected and 2) virtual data instance to be generated by data augmentation, in advance of the acquisition process. Moreover, to enhance the informativeness of the virtual data instances, LADA optimizes the data augmentation policy to maximize the predictive acquisition score, resulting in the proposal of InfoMixup and InfoSTN. As LADA is a generalizable framework, we experiment with the various combinations of acquisition and augmentation methods. The performance of LADA shows a significant improvement over the recent augmentation and acquisition baselines which were independently applied to the benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题