通过利用高级标签相关性来发展多标签分类规则

论文标题

通过利用高级标签相关性来发展多标签分类规则

Evolving Multi-label Classification Rules by Exploiting High-order Label Correlation

论文作者

Nazmi, Shabnam, Yan, Xuyang, Homaifar, Abdollah, Doucette, Emily

论文摘要

在多标签分类任务中，每个问题实例同时与多个类别关联。在这种情况下，标签之间的相关性包含可用于获得更准确的分类模型的有价值信息。标签之间的相关性可以在不同级别上利用，例如捕获成对的相关性或利用高阶相关性。即使高阶方法更有能力建模相关性，但它在计算上的要求更高，并且存在可扩展性问题。本文旨在使用监督学习分类器系统（UCS）利用标签子集中的高阶标签相关性。为此，采用了标签Powerset（LP）策略，并利用相关标签的集合中的预测聚合来提高LP方法在存在未见标签的情况下的预测能力。确切的匹配比和锤击损失度量被认为可以评估规则性能，并研究了两个指标的分类器的预期适应性值。此外，还为提出的算法提供了计算复杂性分析。将所提出方法的实验结果与多个基准数据集上的其他众所周知的基于LP的方法进行了比较，并确认了该方法的竞争性能。

In multi-label classification tasks, each problem instance is associated with multiple classes simultaneously. In such settings, the correlation between labels contains valuable information that can be used to obtain more accurate classification models. The correlation between labels can be exploited at different levels such as capturing the pair-wise correlation or exploiting the higher-order correlations. Even though the high-order approach is more capable of modeling the correlation, it is computationally more demanding and has scalability issues. This paper aims at exploiting the high-order label correlation within subsets of labels using a supervised learning classifier system (UCS). For this purpose, the label powerset (LP) strategy is employed and a prediction aggregation within the set of the relevant labels to an unseen instance is utilized to increase the prediction capability of the LP method in the presence of unseen labelsets. Exact match ratio and Hamming loss measures are considered to evaluate the rule performance and the expected fitness value of a classifier is investigated for both metrics. Also, a computational complexity analysis is provided for the proposed algorithm. The experimental results of the proposed method are compared with other well-known LP-based methods on multiple benchmark datasets and confirm the competitive performance of this method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题