论文标题
随机森林和深层神经网络的非参数特征选择
Nonparametric Feature Selection by Random Forests and Deep Neural Networks
论文作者
论文摘要
随机森林是一种广泛使用的机器学习算法,但是当应用于具有许多实例和无用功能的大规模数据集时,它们的计算效率受到破坏。在此,我们提出了一种非参数特征选择算法,该算法结合了随机森林和深层神经网络,并且在规律性条件下还研究了其理论特性。使用不同的合成模型和一个现实世界的示例,我们证明了所提出的算法比其他替代方案的优势在识别有用的特征,避免无用的特征和计算效率方面的优点。尽管该算法是使用标准随机森林提出的,但可以广泛适应其他机器学习算法,只要可以对特征进行相应的分类。
Random forests are a widely used machine learning algorithm, but their computational efficiency is undermined when applied to large-scale datasets with numerous instances and useless features. Herein, we propose a nonparametric feature selection algorithm that incorporates random forests and deep neural networks, and its theoretical properties are also investigated under regularity conditions. Using different synthetic models and a real-world example, we demonstrate the advantage of the proposed algorithm over other alternatives in terms of identifying useful features, avoiding useless ones, and the computation efficiency. Although the algorithm is proposed using standard random forests, it can be widely adapted to other machine learning algorithms, as long as features can be sorted accordingly.