通过蒙版自我训练回答的无源域改编

论文标题

通过蒙版自我训练回答的无源域改编

Source-Free Domain Adaptation for Question Answering with Masked Self-training

论文作者

Yin, M., Wang, B., Dong, Y., Ling, C.

论文摘要

大多数以前的无监督域适应性（UDA）方法要回答（QA）需要访问源域数据，同时对目标域的模型进行微调。但是，源域数据可能包含敏感信息，并且可能受到限制。在这项研究中，我们研究了一个更具挑战性的无源UDA设置，在该设置中，我们只有预验证的源模型和目标域数据，而无需访问源域数据。我们为质量保证模型提出了一种新型的自我训练方法，该方法将唯一的面膜模块集成了域适应性。在对源域进行训练时，将掩模自动调整以提取关键域知识。为了维持先前学到的域知识，在适应过程中，某些蒙版重量被冷冻，而其他权重则可以通过在目标域中生成的伪标记的样品来调整域的偏移。％作为自我训练过程的一部分，我们基于在源域中训练的模型在目标域中生成伪标记的样品。我们在四个基准数据集上的经验结果表明，我们的方法显着提高了目标域上验证的QA模型的性能，甚至超过了在适应过程中可以访问源数据的模型。

Most previous unsupervised domain adaptation (UDA) methods for question answering(QA) require access to source domain data while fine-tuning the model for the target domain. Source domain data may, however, contain sensitive information and may be restricted. In this study, we investigate a more challenging setting, source-free UDA, in which we have only the pretrained source model and target domain data, without access to source domain data. We propose a novel self-training approach to QA models that integrates a unique mask module for domain adaptation. The mask is auto-adjusted to extract key domain knowledge while trained on the source domain. To maintain previously learned domain knowledge, certain mask weights are frozen during adaptation, while other weights are adjusted to mitigate domain shifts with pseudo-labeled samples generated in the target domain. %As part of the self-training process, we generate pseudo-labeled samples in the target domain based on models trained in the source domain. Our empirical results on four benchmark datasets suggest that our approach significantly enhances the performance of pretrained QA models on the target domain, and even outperforms models that have access to the source data during adaptation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题