COVID-19疗法目标发现与上下文感知文献挖掘

论文标题

COVID-19疗法目标发现与上下文感知文献挖掘

COVID-19 therapy target discovery with context-aware literature mining

论文作者

Martinc, Matej, Škrlj, Blaž, Pirkmajer, Sergej, Lavrač, Nada, Cestnik, Bojan, Marzidovšek, Martin, Pollak, Senja

论文摘要

与广泛的Covid-19大流行有关的丰富文献是对单个专家的手动检查。能够自动处理成千上万的科学出版物的系统的开发，目的是通过基于文献的协会丰富现有的经验证据是挑战和相关的。我们提出了一个通过近似实体之间的关系来对经验表达数据进行上下文化的系统，为此，从最大的Covid-19与19与COVID相关的文献中学到的表示形式。为了通过转移学习来利用更大的科学环境，我们提出了一种新颖的嵌入生成技术，该技术利用SCIBERT语言模型在大型多域科学出版物语料库上预测，并对索索的域进行微调进行微调。医学专家进行的手动评估以及基于相关工作中确定的治疗靶标的定量评估表明，该方法可以成功用于COVID-19治疗目标发现，并且它以较大的余量优于基线FastText方法。

The abundance of literature related to the widespread COVID-19 pandemic is beyond manual inspection of a single expert. Development of systems, capable of automatically processing tens of thousands of scientific publications with the aim to enrich existing empirical evidence with literature-based associations is challenging and relevant. We propose a system for contextualization of empirical expression data by approximating relations between entities, for which representations were learned from one of the largest COVID-19-related literature corpora. In order to exploit a larger scientific context by transfer learning, we propose a novel embedding generation technique that leverages SciBERT language model pretrained on a large multi-domain corpus of scientific publications and fine-tuned for domain adaptation on the CORD-19 dataset. The conducted manual evaluation by the medical expert and the quantitative evaluation based on therapy targets identified in the related work suggest that the proposed method can be successfully employed for COVID-19 therapy target discovery and that it outperforms the baseline FastText method by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题