从竞争到协作：在Kaggle上制作玩具数据集，可用于使用联合学习的胸部X射线诊断有用

论文标题

从竞争到协作：在Kaggle上制作玩具数据集，可用于使用联合学习的胸部X射线诊断有用

From Competition to Collaboration: Making Toy Datasets on Kaggle Clinically Useful for Chest X-Ray Diagnosis Using Federated Learning

论文作者

Kulkarni, Pranav, Kanhere, Adway, Yi, Paul H., Parekh, Vishwa S.

论文摘要

托管在Kaggle上的胸部X射线（CXR）数据集虽然从数据科学竞争的角度有用，但由于临床用途的效用有限，因为它们狭窄地专注于诊断一种特定疾病。在现实世界中，需要考虑多种疾病，因为它们可以在同一患者中共存。在这项工作中，我们演示了如何使用联合学习（FL）来使这些玩具CXR数据集从Kaggle临床上有用。具体而言，我们使用两个单独的CXR数据集训练一个单一的FL分类模型（“全球”） - 一种用于肺炎的存在，另一个用于存在肺炎的存在（两个常见和威胁生命的疾病） - 能够诊断两者。我们将全球FL模型的性能与在两个数据集（“基线”）上分别训练的模型的两个模型架构进行了比较。在标准的3层CNN结构上，肺炎和肺炎的全局FL模型分别为0.84和0.81，而两种基线模型的AUROC分别为0.85和0.85和0.82（P> 0.05）。类似地，在预审预告片的架构结构上，肺炎和肺炎的全局FL模型分别达到0.88和0.91，而两种基线模型的AUROC分别为0.89和0.91（p> 0.05）。我们的结果表明，FL可用于创建全球“元”模型，以使Kaggle的玩具数据集在临床上有用，这是朝着从长凳到床边弥合差距的一步。

Chest X-ray (CXR) datasets hosted on Kaggle, though useful from a data science competition standpoint, have limited utility in clinical use because of their narrow focus on diagnosing one specific disease. In real-world clinical use, multiple diseases need to be considered since they can co-exist in the same patient. In this work, we demonstrate how federated learning (FL) can be used to make these toy CXR datasets from Kaggle clinically useful. Specifically, we train a single FL classification model (`global`) using two separate CXR datasets -- one annotated for presence of pneumonia and the other for presence of pneumothorax (two common and life-threatening conditions) -- capable of diagnosing both. We compare the performance of the global FL model with models trained separately on both datasets (`baseline`) for two different model architectures. On a standard, naive 3-layer CNN architecture, the global FL model achieved AUROC of 0.84 and 0.81 for pneumonia and pneumothorax, respectively, compared to 0.85 and 0.82, respectively, for both baseline models (p>0.05). Similarly, on a pretrained DenseNet121 architecture, the global FL model achieved AUROC of 0.88 and 0.91 for pneumonia and pneumothorax, respectively, compared to 0.89 and 0.91, respectively, for both baseline models (p>0.05). Our results suggest that FL can be used to create global `meta` models to make toy datasets from Kaggle clinically useful, a step forward towards bridging the gap from bench to bedside.

下载PDF全文

下载文献需遵守相关版权规定

论文标题