论文标题

MKQA:多语言开放域问题的语言多样性的基准回答

MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

论文作者

Longpre, Shayne, Lu, Yi, Daiber, Joachim

论文摘要

跨语性建模的进展取决于具有挑战性,现实和多样化的评估集。我们介绍了多语言知识问题和答案(MKQA),这是一个开放域问题,回答评估集,其中包括10K询问答案,这些问题是在26种类型上多样化的语言(总共260k质疑答案对)中对齐的。答案是基于精心策划的,与语言无关的数据表示,使得跨语言和独立于语言特定段落的结果可比。使用26种语言,该数据集提供了迄今为止最广泛的语言,以评估问题答案。我们基准了各种最先进的方法和基准,用于以零镜头和翻译设置进行自然问题的生成和提取性问题回答。结果表明,即使在英语中,该数据集也很具有挑战性,尤其是在低资源语言中

Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). Answers are based on a heavily curated, language-independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering. We benchmark a variety of state-of-the-art methods and baselines for generative and extractive question answering, trained on Natural Questions, in zero shot and translation settings. Results indicate this dataset is challenging even in English, but especially in low-resource languages

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源