论文标题
LIVEQA:一个问题回答有关运动现场的数据集
LiveQA: A Question Answering Dataset over Sports Live
论文作者
论文摘要
在本文中,我们介绍了LiveQA,这是一个新的问题,回答了由逐场播放的实时广播构建的数据集。它包含人类评论员为1,670多个NBA游戏编写的117K多项选择问题,这些问题是从中国HUPU(https://nba.hupu.com/games)网站上收集的。 LiveQA源自体育游戏的特征,可以潜在地测试基于时间轴的实时广播的推理能力,与现有数据集相比,这具有挑战性。在LIVEQA中,问题需要了解时间表,跟踪事件或进行数学计算。我们的初步实验表明,数据集引入了问题回答模型的具有挑战性的问题,强大的基线模型只能达到53.1 \%的准确性,并且无法击败主要的选项规则。我们发布本文的代码和数据供将来研究。
In this paper, we introduce LiveQA, a new question answering dataset constructed from play-by-play live broadcast. It contains 117k multiple-choice questions written by human commentators for over 1,670 NBA games, which are collected from the Chinese Hupu (https://nba.hupu.com/games) website. Derived from the characteristics of sports games, LiveQA can potentially test the reasoning ability across timeline-based live broadcasts, which is challenging compared to the existing datasets. In LiveQA, the questions require understanding the timeline, tracking events or doing mathematical computations. Our preliminary experiments show that the dataset introduces a challenging problem for question answering models, and a strong baseline model only achieves the accuracy of 53.1\% and cannot beat the dominant option rule. We release the code and data of this paper for future research.