论文标题
Symlink:一个用于科学符号描述链接的新数据集
Symlink: A New Dataset for Scientific Symbol-Description Linking
论文作者
论文摘要
数学符号和描述以各种形式出现在文档部分边界,而无需明确标记。在本文中,我们提出了一个新的大规模数据集,该数据集强调在科学文档中提取符号和描述。符号链接注释了5个不同领域(即计算机科学,生物学,物理,数学和经济学)的科学论文。我们在Symlink上的实验证明了符号 - 描述链接到现有模型的挑战,并呼吁在该领域进行进一步的研究工作。我们将公开发布Symlink,以促进未来的研究。
Mathematical symbols and descriptions appear in various forms across document section boundaries without explicit markup. In this paper, we present a new large-scale dataset that emphasizes extracting symbols and descriptions in scientific documents. Symlink annotates scientific papers of 5 different domains (i.e., computer science, biology, physics, mathematics, and economics). Our experiments on Symlink demonstrate the challenges of the symbol-description linking task for existing models and call for further research effort in this area. We will publicly release Symlink to facilitate future research.