论文标题
通过学习目标概念嵌入,用户生成的文本中的医学概念归一化
Medical Concept Normalization in User Generated Texts by Learning Target Concept Embeddings
论文作者
论文摘要
医学概念归一化有助于发现自由形式文本中的标准概念,即将与健康相关的提及映射到词汇中的标准概念。它超出了简单的字符串匹配,需要对概念提及的深层语义理解。最近的研究方法概念归一化为文本分类或文本匹配。现有a)文本分类方法中的主要缺点是忽略有价值的目标概念信息在学习输入概念中提及表示b)文本匹配方法是需要单独生成目标概念嵌入时间,即时间和资源消耗。我们提出的模型通过共同学习输入概念提及和目标概念的表示来克服这些缺点。首先,它学习了使用罗伯塔(Roberta)的输入概念提及的表示。其次,它发现输入概念提及的嵌入与所有目标概念之间的余弦相似性。在这里,目标概念的嵌入是随机初始初始化的,然后在培训期间进行更新。最后,将具有最大余弦相似性的目标概念分配给输入概念提及。我们的模型通过提高准确性高达2.31%,超过了三个标准数据集中的所有现有方法。
Medical concept normalization helps in discovering standard concepts in free-form text i.e., maps health-related mentions to standard concepts in a vocabulary. It is much beyond simple string matching and requires a deep semantic understanding of concept mentions. Recent research approach concept normalization as either text classification or text matching. The main drawback in existing a) text classification approaches is ignoring valuable target concepts information in learning input concept mention representation b) text matching approach is the need to separately generate target concept embeddings which is time and resource consuming. Our proposed model overcomes these drawbacks by jointly learning the representations of input concept mention and target concepts. First, it learns the input concept mention representation using RoBERTa. Second, it finds cosine similarity between embeddings of input concept mention and all the target concepts. Here, embeddings of target concepts are randomly initialized and then updated during training. Finally, the target concept with maximum cosine similarity is assigned to the input concept mention. Our model surpasses all the existing methods across three standard datasets by improving accuracy up to 2.31%.