论文标题
基于深度学习的息肉检测中性能变异的来源
Sources of performance variability in deep learning-based polyp detection
论文作者
论文摘要
验证指标是可靠跟踪科学进步并确定方法的潜在临床翻译的关键先决条件。尽管最近的举措旨在开发全面的理论框架,以了解图像分析问题中与指标相关的陷阱,但缺乏关于常见和罕见陷阱对特定应用的混凝土影响的实验证据。在结肠癌筛查的背景下,我们在文献中解决了这一差距。我们的贡献是双重的。首先,我们介绍了与IEEE国际生物医学成像研讨会(ISBI)2022的内窥镜计算机视觉挑战(ENDOCV)有关结肠癌检测的获胜解决方案。其次,我们证明了常用元素对超级公寓的范围以及不良范围的范围以及相反的元素选择的敏感性。基于对来自六个临床中心的患者数据进行的全面验证研究,我们发现所有常用的对象检测指标都会受到高中心间变异性的约束。此外,我们的结果清楚地表明,计算机视觉社区中使用的标准超参数的适应通常不会导致临床上最合理的结果。最后,我们提出与临床相关性非常相对应的本地化标准。我们的工作可能是重新考虑自动结肠癌筛查应用中常见验证策略的第一步。
Validation metrics are a key prerequisite for the reliable tracking of scientific progress and for deciding on the potential clinical translation of methods. While recent initiatives aim to develop comprehensive theoretical frameworks for understanding metric-related pitfalls in image analysis problems, there is a lack of experimental evidence on the concrete effects of common and rare pitfalls on specific applications. We address this gap in the literature in the context of colon cancer screening. Our contribution is twofold. Firstly, we present the winning solution of the Endoscopy computer vision challenge (EndoCV) on colon cancer detection, conducted in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2022. Secondly, we demonstrate the sensitivity of commonly used metrics to a range of hyperparameters as well as the consequences of poor metric choices. Based on comprehensive validation studies performed with patient data from six clinical centers, we found all commonly applied object detection metrics to be subject to high inter-center variability. Furthermore, our results clearly demonstrate that the adaptation of standard hyperparameters used in the computer vision community does not generally lead to the clinically most plausible results. Finally, we present localization criteria that correspond well to clinical relevance. Our work could be a first step towards reconsidering common validation strategies in automatic colon cancer screening applications.