自动luon-tabular：结构化数据的鲁棒和准确的汽车

论文标题

自动luon-tabular：结构化数据的鲁棒和准确的汽车

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

论文作者

Erickson, Nick, Mueller, Jonas, Shirkov, Alexander, Zhang, Hang, Larroy, Pedro, Li, Mu, Smola, Alexander

论文摘要

我们介绍了Autogluon-Tabular，这是一个开源的Automl框架，仅需要一行Python来训练在未经处理的表格数据集（例如CSV文件）上训练高度准确的机器学习模型。与主要专注于模型/超参数选择的现有自动框架不同，自动luon-tabular通过结合多种型号并将其堆叠在多层中，可以成功。实验表明，我们多层组合的许多模型组合比寻求最好的培训时间更好地利用了分配的训练时间。第二个贡献是对TPOT，H2O，AutoWeka，Auto-Sklearn，Autogluon和Google Automl Table等公共和商业汽车平台的广泛评估。对Kaggle和OpenML AutoML基准的50个分类和回归任务的套件进行测试表明，AutoGluon更快，更强大且更准确。我们发现，自动股通常甚至优于其所有竞争对手的最佳组合。在两次流行的Kaggle比赛中，Autogluon仅在原始数据进行了4小时培训后，击败了99％的参与数据科学家。

We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file. Unlike existing AutoML frameworks that primarily focus on model/hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers. Experiments reveal that our multi-layer combination of many models offers better use of allocated training time than seeking out the best. A second contribution is an extensive evaluation of public and commercial AutoML platforms including TPOT, H2O, AutoWEKA, auto-sklearn, AutoGluon, and Google AutoML Tables. Tests on a suite of 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark reveal that AutoGluon is faster, more robust, and much more accurate. We find that AutoGluon often even outperforms the best-in-hindsight combination of all of its competitors. In two popular Kaggle competitions, AutoGluon beat 99% of the participating data scientists after merely 4h of training on the raw data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题