关于基于等级的实体一致性评估或链接预测方法的歧义

论文标题

关于基于等级的实体一致性评估或链接预测方法的歧义

On the Ambiguity of Rank-Based Evaluation of Entity Alignment or Link Prediction Methods

论文作者

Berrendorf, Max, Faerman, Evgeniy, Vermue, Laurent, Tresp, Volker

论文摘要

在这项工作中，我们仔细研究了两个方法的评估，以丰富知识图的信息：链接预测和实体对齐。在当前的实验环境中，采用多个不同的分数来评估模型性能的不同方面。我们分析了这些评估措施的信息性，并确定了几个缺点。特别是，我们证明了所有现有分数几乎不可用来比较不同数据集的结果。此外，我们证明，根据实体对齐任务的常用指标，自动的测试尺寸的不同大小会自动影响同一模型的性能。我们表明，这导致了结果解释的各种问题，这可能支持误导性结论。因此，我们提出对评估的调整，并从经验上证明这如何支持对模型性能的公平，可比和可解释的评估。我们的代码可在https://github.com/mberr/rank基于基础评估上找到。

In this work, we take a closer look at the evaluation of two families of methods for enriching information from knowledge graphs: Link Prediction and Entity Alignment. In the current experimental setting, multiple different scores are employed to assess different aspects of model performance. We analyze the informativeness of these evaluation measures and identify several shortcomings. In particular, we demonstrate that all existing scores can hardly be used to compare results across different datasets. Moreover, we demonstrate that varying size of the test size automatically has impact on the performance of the same model based on commonly used metrics for the Entity Alignment task. We show that this leads to various problems in the interpretation of results, which may support misleading conclusions. Therefore, we propose adjustments to the evaluation and demonstrate empirically how this supports a fair, comparable, and interpretable assessment of model performance. Our code is available at https://github.com/mberr/rank-based-evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题