论文标题

生物医学事实问题答案的数据增强

Data Augmentation for Biomedical Factoid Question Answering

论文作者

Pappas, Dimitris, Malakasiotis, Prodromos, Androutsopoulos, Ion

论文摘要

我们研究了FACTOID问题回答中七个数据增强方法(DA)方法的效果,重点是生物医学领域,在该领域获得培训实例特别困难。我们试验了BioASQ挑战的数据,我们通过从人工生物医学机器阅读理解数据集获得的培训实例,或通过反向翻译,信息检索,基于Word2Vec嵌入或掩盖语言建模,问题产生或扩展给定上下文的word2Vec嵌入或替换。我们表明,即使使用大型预训练的变压器,DA也会导致非常显着的性能提高,从而更广泛地讨论IF/WHE会使DA受益于大型预训练的模型。最简单的DA方法之一,基于Word2Vec的单词替换,表现最好,建议。我们发布人工培训实例和代码。

We study the effect of seven data augmentation (da) methods in factoid question answering, focusing on the biomedical domain, where obtaining training instances is particularly difficult. We experiment with data from the BioASQ challenge, which we augment with training instances obtained from an artificial biomedical machine reading comprehension dataset, or via back-translation, information retrieval, word substitution based on word2vec embeddings, or masked language modeling, question generation, or extending the given passage with additional context. We show that da can lead to very significant performance gains, even when using large pre-trained Transformers, contributing to a broader discussion of if/when da benefits large pre-trained models. One of the simplest da methods, word2vec-based word substitution, performed best and is recommended. We release our artificial training instances and code.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源