论文标题

自动多标签分类方法的强大实验评估

A Robust Experimental Evaluation of Automated Multi-Label Classification Methods

论文作者

de Sá, Alex G. C., Pimenta, Cristiano G., Pappa, Gisele L., Freitas, Alex A.

论文摘要

自动化机器学习(AUTOML)已出现,以处理给定学习任务的算法的选择和配置。随着汽车的发展,引入了几种有效的方法,尤其是对于传统的分类和回归问题。除了汽车的成功之外,还有几个问题仍然开放。尤其是一个问题是,汽车方法缺乏处理不同类型数据的能力。基于这种情况,本文将AUTOML用于多标签分类(MLC)问题。在MLC中,与标准分类任务不同,每个示例都可以与几个类标签同时关联,在标准分类任务中,一个示例仅与一个类标签相关联。在这项工作中,我们在14个数据集和3个设计的搜索空间上提供了五种自动化多标签分类方法的一般比较 - 两种进化方法,一种进化方法,一种贝叶斯优化方法,一种随机搜索和一个贪婪的搜索。总体而言,我们观察到,最突出的方法是基于基于规范语法的基因编程(GGP)搜索方法的方法,即自动Meka $ _ {GGP} $。 Auto-Meka $ _ {GGP} $在我们的比较中提出了最佳的平均结果,并且在统计上比不同搜索空间和评估措施中的所有其他方法都更好,除非与贪婪的搜索方法相比。

Automated Machine Learning (AutoML) has emerged to deal with the selection and configuration of algorithms for a given learning task. With the progression of AutoML, several effective methods were introduced, especially for traditional classification and regression problems. Apart from the AutoML success, several issues remain open. One issue, in particular, is the lack of ability of AutoML methods to deal with different types of data. Based on this scenario, this paper approaches AutoML for multi-label classification (MLC) problems. In MLC, each example can be simultaneously associated to several class labels, unlike the standard classification task, where an example is associated to just one class label. In this work, we provide a general comparison of five automated multi-label classification methods -- two evolutionary methods, one Bayesian optimization method, one random search and one greedy search -- on 14 datasets and three designed search spaces. Overall, we observe that the most prominent method is the one based on a canonical grammar-based genetic programming (GGP) search method, namely Auto-MEKA$_{GGP}$. Auto-MEKA$_{GGP}$ presented the best average results in our comparison and was statistically better than all the other methods in different search spaces and evaluated measures, except when compared to the greedy search method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源