emixer：端到端多模式X射线通过自学

论文标题

emixer：端到端多模式X射线通过自学

EMIXER: End-to-end Multimodal X-ray Generation via Self-supervision

论文作者

Biswal, Siddharth, Zhuang, Peiye, Pyrros, Ayis, Siddiqui, Nasir, Koyejo, Sanmi, Sun, Jimeng

论文摘要

深层生成模型已使高质量数据的自动合成用于不同的应用程序。但是，最有效的生成模型专门用于来自单个域（例如图像或文本）的数据。现实世界中的应用程序（例如医疗保健）需要来自多个域（例如，图像和相应文本）的多模式数据，这是由于有限的可用性和隐私问题而难以获取的，并且很难合成。为了应对这一联合综合挑战，我们提出了一个端到端的多模式X射线生成模型（EMIXER），用于共同合成X射线图像和相应的自由文本报告，所有这些都以诊断标签为条件。 Emixer是一种条件生成对抗模型，通过1）基于标签生成图像，2）将图像编码为隐藏的嵌入，3）通过图像嵌入的层次解码器以及4）联合歧视器来评估图像和相应的文本。 Emixer还可以使自我设计能够利用大量未标记的数据。使用实际X射线报告的广泛实验数据说明了如何使用合成的多模式样本进行数据扩展可以提高各种监督任务的性能，包括COVID-19 X射线分类和非常有限的样本。放射科医生也证实了生成的图像和报告的质量。我们定量地表明，Emixer生成的合成数据集可以增强X射线图像分类，报告生成模型，以实现仅在实际数据样本上培训的模型上提高5.94％和6.9％的改善。综上所述，我们的结果突出了生成模型的前景，可以推进临床机器学习。

Deep generative models have enabled the automated synthesis of high-quality data for diverse applications. However, the most effective generative models are specialized to data from a single domain (e.g., images or text). Real-world applications such as healthcare require multi-modal data from multiple domains (e.g., both images and corresponding text), which are difficult to acquire due to limited availability and privacy concerns and are much harder to synthesize. To tackle this joint synthesis challenge, we propose an End-to-end MultImodal X-ray genERative model (EMIXER) for jointly synthesizing x-ray images and corresponding free-text reports, all conditional on diagnosis labels. EMIXER is an conditional generative adversarial model by 1) generating an image based on a label, 2) encoding the image to a hidden embedding, 3) producing the corresponding text via a hierarchical decoder from the image embedding, and 4) a joint discriminator for assessing both the image and the corresponding text. EMIXER also enables self-supervision to leverage vast amount of unlabeled data. Extensive experiments with real X-ray reports data illustrate how data augmentation using synthesized multimodal samples can improve the performance of a variety of supervised tasks including COVID-19 X-ray classification with very limited samples. The quality of generated images and reports are also confirmed by radiologists. We quantitatively show that EMIXER generated synthetic datasets can augment X-ray image classification, report generation models to achieve 5.94% and 6.9% improvement on models trained only on real data samples. Taken together, our results highlight the promise of state of generative models to advance clinical machine learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题