通用域通过比例进行伪伪标记的适应

论文标题

通用域通过比例进行伪伪标记的适应

General Domain Adaptation Through Proportional Progressive Pseudo Labeling

论文作者

Hashemi, Mohammad J., Keller, Eric

论文摘要

域的适应性有助于将所获得的知识从标记的源域转移到未标记的目标域。在过去的几年中，已经发布了不同的领域适应技术。这些方法的一个常见缺陷是，尽管它们可能在一种输入类型（例如图像）上运行良好，但应用于其他输入（例如文本或时间序列）时的性能下降。在本文中，我们介绍了比例的渐进式伪标记（PPPL），这是一种简单而有效的技术，可以在几行代码中实现，以构建一种更通用的域适应技术，可以在几种不同的输入类型上应用。在训练阶段开始时，PPPL通过直接使用伪标记的目标域样本训练模型来逐渐减少目标域分类误差，同时排除了训练集中更可能错误的伪标签的样本，还包括对此类样品的训练。在6个不同数据集上进行的实验，其中包括诸如异常检测，文本情感分析和图像分类之类的任务表明PPPL可以击败其他基线并更好地概括。

Domain adaptation helps transfer the knowledge gained from a labeled source domain to an unlabeled target domain. During the past few years, different domain adaptation techniques have been published. One common flaw of these approaches is that while they might work well on one input type, such as images, their performance drops when applied to others, such as text or time-series. In this paper, we introduce Proportional Progressive Pseudo Labeling (PPPL), a simple, yet effective technique that can be implemented in a few lines of code to build a more general domain adaptation technique that can be applied on several different input types. At the beginning of the training phase, PPPL progressively reduces target domain classification error, by training the model directly with pseudo-labeled target domain samples, while excluding samples with more likely wrong pseudo-labels from the training set and also postponing training on such samples. Experiments on 6 different datasets that include tasks such as anomaly detection, text sentiment analysis and image classification demonstrate that PPPL can beat other baselines and generalize better.

下载PDF全文

下载文献需遵守相关版权规定

论文标题