论文标题
相关判断融合学位 - 评估师在信息检索中的不一致性的度量
Relevance Judgment Convergence Degree -- A Measure of Inconsistency among Assessors for Information Retrieval
论文作者
论文摘要
当创建用于信息检索(IR)系统的评估数据集时,人类评估者的相关性判断本质上是主观的和动态的。但是,一小部分专家的相关判断结果通常被视为“客观地”评估IR系统性能的基础真理。最近的趋势打算采用一组法官,例如外包,以减轻仅使用单个专家的判断的潜在偏见判断结果。然而,不同的法官可能有不同的意见,并且可能不会彼此同意,而人类相关性判断的不一致可能会影响红外系统的评估结果。在这项研究中,我们介绍了相关判断收敛程度(RJCD),以衡量评估数据集中查询的质量。实验结果揭示了所提出的RJCD分数与两个IR系统之间的性能差异之间的强相关系数。
Relevance judgment of human assessors is inherently subjective and dynamic when evaluation datasets are created for Information Retrieval (IR) systems. However, a small group of experts' relevance judgment results are usually taken as ground truth to "objectively" evaluate the performance of the IR systems. Recent trends intend to employ a group of judges, such as outsourcing, to alleviate the potentially biased judgment results stemmed from using only a single expert's judgment. Nevertheless, different judges may have different opinions and may not agree with each other, and the inconsistency in human relevance judgment may affect the IR system evaluation results. In this research, we introduce a Relevance Judgment Convergence Degree (RJCD) to measure the quality of queries in the evaluation datasets. Experimental results reveal a strong correlation coefficient between the proposed RJCD score and the performance differences between the two IR systems.