论文标题
BOMD:嘈杂胸部X射线分类的多标签描述符
BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification
论文作者
论文摘要
深度学习方法已显示出在医学成像问题中出色的分类精度,这主要归因于用干净标签手动注释的大规模数据集的可用性。但是,鉴于此类手动注释的高成本,新的医学成像分类问题可能需要依靠从放射学报告中提取的机器生成的嘈杂标签。确实,许多胸部X射线(CXR)分类器已经从具有嘈杂标签的数据集进行了建模,但是它们的训练过程通常对嘈杂标签样品并不强大,从而导致了次优模型。此外,CXR数据集主要是多标签,因此当前为多级问题设计的当前嘈杂标签学习方法无法轻易适应。在本文中,我们提出了一种针对嘈杂的多标签CXR学习设计的新方法,该方法可检测并顺利地从数据集中重新标记样品,然后将其用于训练常见的多标签分类器。提出的方法优化了一袋多标签描述符(BOMD),以促进其与来自多标签图像注释BERT模型产生的语义描述符相似性。我们对各种嘈杂的多标签训练集和清洁测试集的实验表明,我们的模型在许多CXR多标签分类基准中具有最先进的准确性和鲁棒性。
Deep learning methods have shown outstanding classification accuracy in medical imaging problems, which is largely attributed to the availability of large-scale datasets manually annotated with clean labels. However, given the high cost of such manual annotation, new medical imaging classification problems may need to rely on machine-generated noisy labels extracted from radiology reports. Indeed, many Chest X-ray (CXR) classifiers have already been modelled from datasets with noisy labels, but their training procedure is in general not robust to noisy-label samples, leading to sub-optimal models. Furthermore, CXR datasets are mostly multi-label, so current noisy-label learning methods designed for multi-class problems cannot be easily adapted. In this paper, we propose a new method designed for the noisy multi-label CXR learning, which detects and smoothly re-labels samples from the dataset, which is then used to train common multi-label classifiers. The proposed method optimises a bag of multi-label descriptors (BoMD) to promote their similarity with the semantic descriptors produced by BERT models from the multi-label image annotation. Our experiments on diverse noisy multi-label training sets and clean testing sets show that our model has state-of-the-art accuracy and robustness in many CXR multi-label classification benchmarks.