区别：可区分的自动化功能工程

论文标题

区别：可区分的自动化功能工程

DIFER: Differentiable Automated Feature Engineering

论文作者

Zhu, Guanghui, Xu, Zhuoer, Guo, Xu, Yuan, Chunfeng, Huang, Yihua

论文摘要

功能工程是机器学习的关键步骤，旨在从原始数据中提取有用的功能以提高数据质量。近年来，已经致力于自动化功能工程（AUTOFE）来取代昂贵的人工劳动力。但是，由于在离散空间上将Autofe视为粗粒黑盒优化问题，因此现有方法在计算上需要进行计算。在这项工作中，我们提出了一种基于梯度的有效方法，称为“不同”，以在连续的向量空间中执行可区分的自动化特征工程。根据进化算法选择潜在功能，并利用编码器predictor-decoder控制器来优化现有功能。我们通过编码器将特征映射到连续的矢量空间中，从而优化沿预测分数引起的梯度方向的嵌入，并从解码器优化的嵌入中恢复更好的特征。关于分类和回归数据集的广泛实验表明，就效率和性能而言，不同的机器学习算法的性能可以显着提高各种机器学习算法的性能，并且超过了当前最新的自动FOUT方法。

Feature engineering, a crucial step of machine learning, aims to extract useful features from raw data to improve data quality. In recent years, great efforts have been devoted to Automated Feature Engineering (AutoFE) to replace expensive human labor. However, existing methods are computationally demanding due to treating AutoFE as a coarse-grained black-box optimization problem over a discrete space. In this work, we propose an efficient gradient-based method called DIFER to perform differentiable automated feature engineering in a continuous vector space. DIFER selects potential features based on evolutionary algorithm and leverages an encoder-predictor-decoder controller to optimize existing features. We map features into the continuous vector space via the encoder, optimize the embedding along the gradient direction induced by the predicted score, and recover better features from the optimized embedding by the decoder. Extensive experiments on classification and regression datasets demonstrate that DIFER can significantly improve the performance of various machine learning algorithms and outperform current state-of-the-art AutoFE methods in terms of both efficiency and performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题