论文标题

ChemNLP:基于自然语言处理的材料化学文本数据的库

ChemNLP: A Natural Language Processing based Library for Materials Chemistry Text Data

论文作者

Choudhary, Kamal, Kelley, Mathew L.

论文摘要

在这项工作中,我们介绍可用于1)策划材料和化学文献的开放访问数据集,开发和比较传统的机器学习,变形金刚和图形神经网络模型2)分类和聚类文本,3)命名实体识别,3)用于大规模的文本收入的大规模摘要,4)摘要摘要,5)摘要,5)摘要,5)摘要,5)摘要,5)用于识别潜在候选材料(例如超导体)的功能理论数据集,以及7)文本和参考查询的Web-Interface开发。我们主要使用公开可用的ARXIV和PubChem数据集,但这些工具也可以用于其他数据集。此外,随着新模型的开发,它们可以轻松地集成到库中。 Chemnlp可在网站上找到:https://github.com/usnistgov/chemnlp和https://jarvis.nist.gov/jarvischemnlp。

In this work, we present the ChemNLP library that can be used for 1) curating open access datasets for materials and chemistry literature, developing and comparing traditional machine learning, transformers and graph neural network models for 2) classifying and clustering texts, 3) named entity recognition for large-scale text-mining, 4) abstractive summarization for generating titles of articles from abstracts, 5) text generation for suggesting abstracts from titles, 6) integration with density functional theory dataset for identifying potential candidate materials such as superconductors, and 7) web-interface development for text and reference query. We primarily use the publicly available arXiv and Pubchem datasets but the tools can be used for other datasets as well. Moreover, as new models are developed, they can be easily integrated in the library. ChemNLP is available at the websites: https://github.com/usnistgov/chemnlp and https://jarvis.nist.gov/jarvischemnlp.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源