论文标题
在多模式医学成像任务上评估可解释的AI:现有算法可以满足临床要求吗?
Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?
论文作者
论文摘要
能够解释对临床最终用户的预测是利用人工智能(AI)模型的力量来获得临床决策支持的必要性。对于医学图像,特征归因图或热图是最常见的解释形式,它突出了AI模型预测的重要特征。但是,尚不清楚热图在解释多模式医学图像的决策时的表现如何,其中每个图像模式或通道可视化相同基础生物医学现象的不同临床信息。了解这种依赖方式的特征对于临床用户对AI决策的解释至关重要。为了解决这一临床重要但在技术上忽略的问题,我们提出了特定于模式的特征重要性(MSFI)指标。它编码临床图像和模式优先级的解释模式和特定于模态特征定位。我们使用计算方法和临床医生用户研究进行了临床需求的系统评估。结果表明,所检查的16个热图算法未能满足临床要求,无法正确指示AI模型的决策过程或决策质量。评估和MSFI度量可以指导XAI算法的设计和选择,以满足多模式解释的临床要求。
Being able to explain the prediction to clinical end-users is a necessity to leverage the power of artificial intelligence (AI) models for clinical decision support. For medical images, a feature attribution map, or heatmap, is the most common form of explanation that highlights important features for AI models' prediction. However, it is unknown how well heatmaps perform on explaining decisions on multi-modal medical images, where each image modality or channel visualizes distinct clinical information of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users' interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the modality-specific feature importance (MSFI) metric. It encodes clinical image and explanation interpretation patterns of modality prioritization and modality-specific feature localization. We conduct a clinical requirement-grounded, systematic evaluation using computational methods and a clinician user study. Results show that the examined 16 heatmap algorithms failed to fulfill clinical requirements to correctly indicate AI model decision process or decision quality. The evaluation and MSFI metric can guide the design and selection of XAI algorithms to meet clinical requirements on multi-modal explanation.