对排名指标的公正评估揭示了科学和技术引文数据的一致性

论文标题

对排名指标的公正评估揭示了科学和技术引文数据的一致性

Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data

论文作者

Xu, Shuqi, Mariani, Manuel Sebastian, Lü, Linyuan, Medo, Matúš

论文摘要

尽管基于引用的指标越来越多地用于研究评估目的，但我们尚不知道哪种指标最能实现他们的承诺，以评估科学论文或专利的重要性。我们通过在三个大型引用数据集中识别里程碑论文和专利的能力来评估17个基于网络的指标。我们发现，传统的信息回程评估指标受到里程碑项目的年龄分布与评估指标的年龄偏见之间的相互作用的强烈影响。因此，这些指标的结果不能代表指标的排名能力。我们主张采用修改的评估程序，该程序明确惩罚有偏见的指标，并允许我们揭示指标的性能模式，这些模式在整个数据集中是一致的。当Pagerank和Leaderrank的年龄偏见被他们产生的分数的简单转换所抑制时，Pagerank和Leaderrank是表现最好的排名指标，而其他流行的指标，包括引用数量，命中率和集体影响，会产生明显较差的排名结果。

Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics' ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics' performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题