论文标题

多标签原型生成用于k-nearest邻居分类的数据降低

Multilabel Prototype Generation for Data Reduction in k-Nearest Neighbour classification

论文作者

Valero-Mas, Jose J., Gallego, Antonio Javier, Alonso-Jiménez, Pablo, Serra, Xavier

论文摘要

通常考虑使用原型生成(PG)方法来提高$ k $ neart nearbor($ k $ nn)分类器的效率,以解决高尺寸的语料库时。与初始集合相比,这种方法旨在生成降低的语料库版本,而不会降低分类性能。尽管它们在多类方案中使用了大量应用,但很少有作品解决了多标签空间的PG方法的提案。在这方面,这项工作介绍了四种多类PG策略对多标签案例的新颖调整。这些建议通过三个基于$ k $ nn的分类器进行评估,其中12个Corpora包括各种域和语料库的大小,以及数据中人为诱导的不同噪声场景。获得的结果表明,所提出的适应能够在效率和分类绩效方面显着改善,这是文献中唯一的参考多标记PG工作,也没有应用PG方法,在噪声场景中也具有统计上较高的鲁棒性。此外,这些新颖的PG策略允许通过其配置优先考虑效率或功效标准,具体取决于目标情况,因此涵盖了以前未被其他作品填充的解决方案空间中的广泛区域。

Prototype Generation (PG) methods are typically considered for improving the efficiency of the $k$-Nearest Neighbour ($k$NN) classifier when tackling high-size corpora. Such approaches aim at generating a reduced version of the corpus without decreasing the classification performance when compared to the initial set. Despite their large application in multiclass scenarios, very few works have addressed the proposal of PG methods for the multilabel space. In this regard, this work presents the novel adaptation of four multiclass PG strategies to the multilabel case. These proposals are evaluated with three multilabel $k$NN-based classifiers, 12 corpora comprising a varied range of domains and corpus sizes, and different noise scenarios artificially induced in the data. The results obtained show that the proposed adaptations are capable of significantly improving -- both in terms of efficiency and classification performance -- the only reference multilabel PG work in the literature as well as the case in which no PG method is applied, also presenting a statistically superior robustness in noisy scenarios. Moreover, these novel PG strategies allow prioritising either the efficiency or efficacy criteria through its configuration depending on the target scenario, hence covering a wide area in the solution space not previously filled by other works.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源