结合对比度学习和知识图嵌入，以开发意大利语的医学词嵌入

论文标题

结合对比度学习和知识图嵌入，以开发意大利语的医学词嵌入

Combining Contrastive Learning and Knowledge Graph Embeddings to develop medical word embeddings for the Italian language

论文作者

Bondarenko, Denys Amore, Ferrod, Roger, Di Caro, Luigi

论文摘要

单词嵌入在当今的自然语言处理任务和应用中起着重要作用。虽然可以直接使用预训练的模型并将其集成到现有管道中，但它们通常经过微调以更好地与特定的语言或域相贴合。在本文中，我们试图通过对比度学习（CL）和知识图嵌入（KGE）的结合来改善意大利医学领域未发现的嵌入。主要目的是提高医学术语之间语义相似性的准确性，这也用作评估任务。由于意大利语缺乏医学文本和受控词汇，因此我们通过结合了先前存在的CL方法（多相似性损失，上下文化，动态抽样）和KGES的集成，从而开发了一种特定的解决方案，从而创造了新的损失变体。尽管没有以多语言模型为代表的最先进的结果，但获得的结果令人鼓舞，与起始模型相比，性能相比具有显着的飞跃，同时使用了明显较低的数据。

Word embeddings play a significant role in today's Natural Language Processing tasks and applications. While pre-trained models may be directly employed and integrated into existing pipelines, they are often fine-tuned to better fit with specific languages or domains. In this paper, we attempt to improve available embeddings in the uncovered niche of the Italian medical domain through the combination of Contrastive Learning (CL) and Knowledge Graph Embedding (KGE). The main objective is to improve the accuracy of semantic similarity between medical terms, which is also used as an evaluation task. Since the Italian language lacks medical texts and controlled vocabularies, we have developed a specific solution by combining preexisting CL methods (multi-similarity loss, contextualization, dynamic sampling) and the integration of KGEs, creating a new variant of the loss. Although without having outperformed the state-of-the-art, represented by multilingual models, the obtained results are encouraging, providing a significant leap in performance compared to the starting model, while using a significantly lower amount of data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题