在学术出版物中进行微调预训练的预训练的上下文嵌入以进行引用内容分析

论文标题

在学术出版物中进行微调预训练的预训练的上下文嵌入以进行引用内容分析

Fine-tuning Pre-trained Contextual Embeddings for Citation Content Analysis in Scholarly Publication

论文作者

Chen, Haihua, Nguyen, Huyen

论文摘要

引文功能和引文情感是引文内容分析（CCA）的两个基本方面，这些方面可用于影响分析，即科学出版物的建议。但是，现有的研究主要是传统的机器学习方法，尽管还探讨了深度学习技术，但由于培训数据不足，因此性能的改善似乎并不显着，这给应用程序带来了困难。在本文中，我们建议为该任务微调预训练的上下文嵌入ULMFIT，BERT和XLNET。三个公共数据集的实验表明，我们的策略在F1分数方面优于所有基准。对于引用函数识别，XLNET模型在DFKI，UMICH和TKDE2019数据集中分别达到87.2％，86.90％和81.6％，而在引文仪表予以识别的期限上，DFKI和UMICH的DFKI和UMICH却达到91.72％和91.56％。我们的方法可用于增强学者和学术出版物的影响分析。

Citation function and citation sentiment are two essential aspects of citation content analysis (CCA), which are useful for influence analysis, the recommendation of scientific publications. However, existing studies are mostly traditional machine learning methods, although deep learning techniques have also been explored, the improvement of the performance seems not significant due to insufficient training data, which brings difficulties to applications. In this paper, we propose to fine-tune pre-trained contextual embeddings ULMFiT, BERT, and XLNet for the task. Experiments on three public datasets show that our strategy outperforms all the baselines in terms of the F1 score. For citation function identification, the XLNet model achieves 87.2%, 86.90%, and 81.6% on DFKI, UMICH, and TKDE2019 datasets respectively, while it achieves 91.72% and 91.56% on DFKI and UMICH in term of citation sentiment identification. Our method can be used to enhance the influence analysis of scholars and scholarly publications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题