论文标题

从研究文章中独立于领域的科学概念提取

Domain-independent Extraction of Scientific Concepts from Research Articles

论文作者

Brack, Arthur, D'Souza, Jennifer, Hoppe, Anett, Auer, Sören, Ewerth, Ralph

论文摘要

我们研究了独立于领域的科学概念从学术文章的摘要中提取的新任务,并提出了两种贡献。首先,我们建议在系统注释过程中确定的一系列通用科学概念。这组概念用于注释与领域专家共同努力的副作用的10个科学,技术和医学领域的科学摘要。所得数据集用于(a)为此任务提供基线性能的一组基准实验中,(b)检查域之间概念的可传递性。其次,我们将两个深度学习系统作为基础。特别是,我们建议积极学习在我们的任务中处理不同的领域。实验结果表明,(1)与域专家协商后,非专家可以达成实质性的协议,(2)基线系统达到了相当高的F1得分,(3)主动学习使我们能够几乎将所需培训数据的数量减半。

We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present two deep learning systems as baselines. In particular, we propose active learning to deal with different domains in our task. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源