基于能力的多模式课程学习医学报告生成

论文标题

基于能力的多模式课程学习医学报告生成

Competence-based Multimodal Curriculum Learning for Medical Report Generation

论文作者

Liu, Fenglin, Ge, Shen, Zou, Yuexian, Wu, Xian

论文摘要

医疗报告生成任务的目标是生产长期且连贯的医学图像描述，最近吸引了不断增长的研究兴趣。与一般图像字幕任务不同，医疗报告的生成对于数据驱动的神经模型更具挑战性。这主要是由于1）严重的数据偏见和2）有限的医疗数据。为了减轻数据偏见并充分利用可用数据，我们提出了一个基于能力的多模式课程学习框架（CMCL）。具体而言，CMCL模拟了放射科医生的学习过程，并逐步进行了模型。首先，CMCL估计每个培训实例的难度，并评估当前模型的能力。其次，CMCL选择了考虑当前模型能力的最合适的培训实例。通过在两个步骤上进行迭代，CMCL可以逐渐改善模型的性能。公众IU-XRAR和MIMIC-CXR数据集进行的实验表明，CMCL可以合并到现有模型中以提高其性能。

Medical report generation task, which targets to produce long and coherent descriptions of medical images, has attracted growing research interests recently. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias and 2) the limited medical data. To alleviate the data bias and make best use of available data, we propose a Competence-based Multimodal Curriculum Learning framework (CMCL). Specifically, CMCL simulates the learning process of radiologists and optimizes the model in a step by step manner. Firstly, CMCL estimates the difficulty of each training instance and evaluates the competence of current model; Secondly, CMCL selects the most suitable batch of training instances considering current model competence. By iterating above two steps, CMCL can gradually improve the model's performance. The experiments on the public IU-Xray and MIMIC-CXR datasets show that CMCL can be incorporated into existing models to improve their performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题