论文标题
使用来自中国,意大利,日本的跨国数据,联合的半监督学习胸部CT的共同区域分割
Federated Semi-Supervised Learning for COVID Region Segmentation in Chest CT using Multi-National Data from China, Italy, Japan
论文作者
论文摘要
Covid-19的最近爆发导致了对SARS-COV-2感染的可靠诊断和管理的迫切需求。作为一种免费工具,胸部CT已被证明能够揭示Covid-19的视觉模式,该模式在疾病过程中的多个阶段具有明确的价值。为了促进CT分析,最近的努力集中在计算机辅助表征和诊断上,这表现出了令人鼓舞的结果。但是,在部署基于学习的模型时,跨临床数据中心数据的域变化构成了一个严重的挑战。在这项工作中,我们试图通过联邦和半监督学习来找到解决这一挑战的解决方案。当通过一个数据集训练模型并将其应用于另一个数据集时,采用了由三个国家的1704次扫描组成的跨国数据库研究性能差距。专家放射科医师手动划定了945次扫描,以进行共同研究结果。在处理数据和注释的变异性时,提出了一种新型联合半监督学习技术,以充分利用所有可用数据(有或没有注释)。联合学习避免了对敏感数据共享的需求,这使其对具有严格的数据隐私监管政策的机构和国家有利。此外,半义务有可能减轻分布式环境下的注释负担。与完全监督的方案相比,提出的框架被证明是有效的,并具有传统的数据共享而不是模型重量共享。
The recent outbreak of COVID-19 has led to urgent needs for reliable diagnosis and management of SARS-CoV-2 infection. As a complimentary tool, chest CT has been shown to be able to reveal visual patterns characteristic for COVID-19, which has definite value at several stages during the disease course. To facilitate CT analysis, recent efforts have focused on computer-aided characterization and diagnosis, which has shown promising results. However, domain shift of data across clinical data centers poses a serious challenge when deploying learning-based models. In this work, we attempt to find a solution for this challenge via federated and semi-supervised learning. A multi-national database consisting of 1704 scans from three countries is adopted to study the performance gap, when training a model with one dataset and applying it to another. Expert radiologists manually delineated 945 scans for COVID-19 findings. In handling the variability in both the data and annotations, a novel federated semi-supervised learning technique is proposed to fully utilize all available data (with or without annotations). Federated learning avoids the need for sensitive data-sharing, which makes it favorable for institutions and nations with strict regulatory policy on data privacy. Moreover, semi-supervision potentially reduces the annotation burden under a distributed setting. The proposed framework is shown to be effective compared to fully supervised scenarios with conventional data sharing instead of model weight sharing.