不要过分贴合或不足以使源域？关于回答的领域概括的实证研究

论文标题

不要过分贴合或不足以使源域？关于回答的领域概括的实证研究

Not to Overfit or Underfit the Source Domains? An Empirical Study of Domain Generalization in Question Answering

论文作者

Sultan, Md Arafat, Sil, Avirup, Florian, Radu

论文摘要

机器学习模型容易过度适应其训练（源）领域，这通常被认为是他们在新型目标领域中步履蹒跚的原因。在这里，我们研究了对比的观点，即多源域概括（DG）首先是减轻源域的拟合范围的问题：模型未充分地学习其多域培训数据中已经存在的信号。阅读理解理解DG基准上的实验表明，随着模型的了解，使用较大模型的知识蒸馏（KD）等熟悉的方法，其零拍摄的零局局外效用以更快的速度提高了。改进的源域学习还表明，与旨在限制过度拟合的三种流行的DG方法相比，较高的域外概括。我们对基于KD的域概括的实现可通过PrimeQA提供：https：//ibm.biz/domain-generalization-with-kd。

Machine learning models are prone to overfitting their training (source) domains, which is commonly believed to be the reason why they falter in novel target domains. Here we examine the contrasting view that multi-source domain generalization (DG) is first and foremost a problem of mitigating source domain underfitting: models not adequately learning the signal already present in their multi-domain training data. Experiments on a reading comprehension DG benchmark show that as a model learns its source domains better -- using familiar methods such as knowledge distillation (KD) from a bigger model -- its zero-shot out-of-domain utility improves at an even faster pace. Improved source domain learning also demonstrates superior out-of-domain generalization over three popular existing DG approaches that aim to limit overfitting. Our implementation of KD-based domain generalization is available via PrimeQA at: https://ibm.biz/domain-generalization-with-kd.

下载PDF全文

下载文献需遵守相关版权规定

论文标题