论文标题
STEM科学作为过程,方法,材料和数据的概述
Overview of STEM Science as Process, Method, Material, and Data Named Entities
论文作者
论文摘要
我们在全球学术出版物中面临着空前的作品。数字图书馆中的利益相关者认为,基于文件的发布范式已达到了充分限制。取而代之的是,强烈主张结构化的,机器介入的,细粒度的学术知识出版作为知识图(kg)。在这项工作中,我们开发和分析了10个不同学科的STEM文章的大规模结构化数据集,即。农业,天文学,生物学,化学,计算机科学,地球科学,工程,材料科学,数学和医学。我们的分析是在一个大规模语料库中定义的,该语料库包括60k摘要,该摘要构成了四个科学实体过程,方法,材料和数据。因此,我们的研究首次提出了对四个命名实体标签的构建体的大规模多学科语料库的分析,这些标签被专门定义并被选为与域无关的,而不是域特异性。然后,这项工作是无意中的,是通过独立于领域的概念来表征多学科科学的可行性测试。此外,为了总结每个学科的科学知识的不同方面,提供了一组单词云可视化。在这项工作中创建的STEM-NER-60K语料库包括从一个从一个主要出版平台获得的60k STEM文章中提取的超过100万个实体,并公开发布https://github.com/jd-coderepos/stem-ner-ner-60k。
We are faced with an unprecedented production in scholarly publications worldwide. Stakeholders in the digital libraries posit that the document-based publishing paradigm has reached the limits of adequacy. Instead, structured, machine-interpretable, fine-grained scholarly knowledge publishing as Knowledge Graphs (KG) is strongly advocated. In this work, we develop and analyze a large-scale structured dataset of STEM articles across 10 different disciplines, viz. Agriculture, Astronomy, Biology, Chemistry, Computer Science, Earth Science, Engineering, Material Science, Mathematics, and Medicine. Our analysis is defined over a large-scale corpus comprising 60K abstracts structured as four scientific entities process, method, material, and data. Thus our study presents, for the first-time, an analysis of a large-scale multidisciplinary corpus under the construct of four named entity labels that are specifically defined and selected to be domain-independent as opposed to domain-specific. The work is then inadvertently a feasibility test of characterizing multidisciplinary science with domain-independent concepts. Further, to summarize the distinct facets of scientific knowledge per concept per discipline, a set of word cloud visualizations are offered. The STEM-NER-60k corpus, created in this work, comprises over 1M extracted entities from 60k STEM articles obtained from a major publishing platform and is publicly released https://github.com/jd-coderepos/stem-ner-60k.