论文标题
MultireQA:检索问题回答模型的跨域评估
MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models
论文作者
论文摘要
检索问答(REQA)是从开放语料库中检索问题的句子级答案的任务(Ahmad等,2019)。本文介绍了MultireQa,重新介绍了Multi-ain Reqa评估套件,该套件由从八个检索QA任务中提取的八个检索QA任务,从公开可公开可用的QA数据集中提取。我们分别基于微调BERT和USE-QA模型,以及一个令人惊讶的强大信息检索基线BM25,使用两个有监督的神经模型对这些数据集进行了首个基于系统检索的评估。其中五个任务既包含培训和测试数据,而三个仅包含测试数据。使用火车数据数据进行的五个任务的性能表明,尽管涵盖所有域的一般模型是可以实现的,但通常是通过专门培训域数据来获得最佳性能。
Retrieval question answering (ReQA) is the task of retrieving a sentence-level answer to a question from an open corpus (Ahmad et al.,2019).This paper presents MultiReQA, anew multi-domain ReQA evaluation suite com-posed of eight retrieval QA tasks drawn from publicly available QA datasets. We provide the first systematic retrieval based evaluation over these datasets using two supervised neural models, based on fine-tuning BERT andUSE-QA models respectively, as well as a surprisingly strong information retrieval baseline,BM25. Five of these tasks contain both train-ing and test data, while three contain test data only. Performance on the five tasks with train-ing data shows that while a general model covering all domains is achievable, the best performance is often obtained by training exclusively on in-domain data.