医学图像分析中的临床可解释AI的指南和评估

论文标题

医学图像分析中的临床可解释AI的指南和评估

Guidelines and Evaluation of Clinical Explainable AI in Medical Image Analysis

论文作者

Jin, Weina, Li, Xiaoxiao, Fatehi, Mostafa, Hamarneh, Ghassan

论文摘要

可解释的人工智能（XAI）对于使临床用户能够从AI获得明智的决策支持并遵守循证医学实践至关重要。在临床环境中应用XAI需要适当的评估标准，以确保说明技术在技术上是合理的且在临床上有用，但是缺乏实现此目标的具体支持。为了弥合研究差距，我们提出了临床XAI指南，该指南由五个标准组成，需要优化临床XAI。该指南建议根据指南1（G1）的可理解性和G2临床相关性选择解释表格。对于所选的解释形式，其特定的XAI技术应针对G3真实性，G4信息合理性和G5计算效率进行优化。遵循该准则，我们对具有两项临床任务的多模式医学图像解释的新问题进行了系统评估，并相应地提出了新的评估指标。评估了16种常用的热图XAI技术，并且由于它们在G3和G4中的失败，因此发现临床使用不足。我们的评估证明了使用临床XAI指南来支持临床可行XAI的设计和评估。

Explainable artificial intelligence (XAI) is essential for enabling clinical users to get informed decision support from AI and comply with evidence-based medical practice. Applying XAI in clinical settings requires proper evaluation criteria to ensure the explanation technique is both technically sound and clinically useful, but specific support is lacking to achieve this goal. To bridge the research gap, we propose the Clinical XAI Guidelines that consist of five criteria a clinical XAI needs to be optimized for. The guidelines recommend choosing an explanation form based on Guideline 1 (G1) Understandability and G2 Clinical relevance. For the chosen explanation form, its specific XAI technique should be optimized for G3 Truthfulness, G4 Informative plausibility, and G5 Computational efficiency. Following the guidelines, we conducted a systematic evaluation on a novel problem of multi-modal medical image explanation with two clinical tasks, and proposed new evaluation metrics accordingly. Sixteen commonly-used heatmap XAI techniques were evaluated and found to be insufficient for clinical use due to their failure in G3 and G4. Our evaluation demonstrated the use of Clinical XAI Guidelines to support the design and evaluation of clinically viable XAI.

下载PDF全文

下载文献需遵守相关版权规定

论文标题