论文标题

域的端到端综合数据生成,用于域的适应问题答案系统

End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

论文作者

Shakeri, Siamak, Santos, Cicero Nogueira dos, Zhu, Henry, Ng, Patrick, Nan, Feng, Wang, Zhiguo, Nallapati, Ramesh, Xiang, Bing

论文摘要

我们为合成QA数据生成提出了一种端到端方法。我们的模型包括一个基于变压器的单一编码器 - 码头网络,该网络是端对端训练以生成答案和问题的。简而言之,我们向编码器喂了一段段落,并要求解码器产生一个问题和一个逐个答案。生成过程中产生的可能性被用作滤波得分,这避免了对单独的过滤模型的需求。我们的发电机通过使用最大似然估计来微调LM进行训练。实验结果表明,质量检查模型的域适应性的显着改善优于当前最新方法。

We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering score, which avoids the need for a separate filtering model. Our generator is trained by fine-tuning a pretrained LM using maximum likelihood estimation. The experimental results indicate significant improvements in the domain adaptation of QA models outperforming current state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源