通过根据临床注释提取的概念来评估死亡率预测

论文标题

通过根据临床注释提取的概念来评估死亡率预测

Assessing mortality prediction through different representation models based on concepts extracted from clinical notes

论文作者

Memarzadeh, Hoda, Ghadiri, Nasser, Shahreza, Maryam Lotfi

论文摘要

近年来，人们对使用电子病历（EMR）进行次要目的特别感兴趣，以增强医疗保健提供的质量和安全性。 EMR倾向于包含大量有价值的临床笔记。学习嵌入是一种将笔记转换为使其可比性的格式的方法。基于变压器的表示模型最近取得了巨大的飞跃。这些模型在大型在线数据集上进行了预培训，以有效地了解自然语言文本。学习嵌入的质量受临床注释如何用作表示模型的输入的影响。临床注释的几个部分具有不同的信息价值。医疗保健提供者通常使用不同的表达方式来实现同一概念也很常见。现有方法直接使用临床注释或初始预处理作为表示模型的输入。但是，要学习良好的嵌入，我们确定了最重要的临床笔记部分。然后，我们将提取的概念从选定部分映射到统一医学语言系统（UMLS）中的标准名称。我们使用与唯一概念相对应的标准短语作为临床模型的输入。我们进行了实验，以测量在公共可用的医疗信息集市（MIMIC-III）数据集的子集中，在医院死亡率预测的任务中，学到的嵌入向量的实用性。根据实验，与其他输入格式相比，基于临床变压器的表示模型通过提取的独特概念的标准名称产生的输入产生了更好的结果。表现最好的模型分别是Biobert，PubMedbert和Umlsbert。

Recent years have seen particular interest in using electronic medical records (EMRs) for secondary purposes to enhance the quality and safety of healthcare delivery. EMRs tend to contain large amounts of valuable clinical notes. Learning of embedding is a method for converting notes into a format that makes them comparable. Transformer-based representation models have recently made a great leap forward. These models are pre-trained on large online datasets to understand natural language texts effectively. The quality of a learning embedding is influenced by how clinical notes are used as input to representation models. A clinical note has several sections with different levels of information value. It is also common for healthcare providers to use different expressions for the same concept. Existing methods use clinical notes directly or with an initial preprocessing as input to representation models. However, to learn a good embedding, we identified the most essential clinical notes section. We then mapped the extracted concepts from selected sections to the standard names in the Unified Medical Language System (UMLS). We used the standard phrases corresponding to the unique concepts as input for clinical models. We performed experiments to measure the usefulness of the learned embedding vectors in the task of hospital mortality prediction on a subset of the publicly available Medical Information Mart for Intensive Care (MIMIC-III) dataset. According to the experiments, clinical transformer-based representation models produced better results with getting input generated by standard names of extracted unique concepts compared to other input formats. The best-performing models were BioBERT, PubMedBERT, and UmlsBERT, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题