评估搜索系统用精神计算学和众包的解释性

论文标题

评估搜索系统用精神计算学和众包的解释性

Evaluating Search System Explainability with Psychometrics and Crowdsourcing

论文作者

Chen, Catherine, Eickhoff, Carsten

论文摘要

随着信息检索（IR）系统（例如搜索引擎和对话代理）在各个领域中变得无处不在，因此需要透明和可解释的系统的需求增长以确保问责制，公平和公正的结果。尽管最近可以解释的AI和IR技术取得了进步，但关于解释性的定义尚无共识。现有的方法通常将其视为一个单数概念，而无视文献中假定的多维定义。在本文中，我们使用心理计量学和众包来识别网络搜索系统中解释性的以人为本的因素，并引入SSE（搜索系统解释性），这是可解释的IR（XIR）搜索系统的评估指标。在众包用户研究中，我们证明了SSE区分可解释和不可解释的系统的能力，表明分数较高的系统确实表明了更大的解释性。我们希望，除了对XIR的这些具体贡献外，这项工作还将成为机器学习和自然语言处理其他领域的类似解释性评估工作的蓝图。

As information retrieval (IR) systems, such as search engines and conversational agents, become ubiquitous in various domains, the need for transparent and explainable systems grows to ensure accountability, fairness, and unbiased results. Despite recent advances in explainable AI and IR techniques, there is no consensus on the definition of explainability. Existing approaches often treat it as a singular notion, disregarding the multidimensional definition postulated in the literature. In this paper, we use psychometrics and crowdsourcing to identify human-centered factors of explainability in Web search systems and introduce SSE (Search System Explainability), an evaluation metric for explainable IR (XIR) search systems. In a crowdsourced user study, we demonstrate SSE's ability to distinguish between explainable and non-explainable systems, showing that systems with higher scores indeed indicate greater interpretability. We hope that aside from these concrete contributions to XIR, this line of work will serve as a blueprint for similar explainability evaluation efforts in other domains of machine learning and natural language processing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题