论文标题

Controlburn:稀疏树合奏的非线性特征选择

ControlBurn: Nonlinear Feature Selection with Sparse Tree Ensembles

论文作者

Liu, Brian, Xie, Miaolan, Yang, Haoyue, Udell, Madeleine

论文摘要

ControlBurn是一个Python软件包,用于构建支持非线性特征选择和可解释的机器学习的功能 - 帕斯斯树合奏。此软件包中的算法首先构建了具有很少功能的基础功能的优先级函数,然后使用加权Lasso优化标准选择这些基础功能的功能 - sparse子集。该软件包包括可视化的,以分析合奏选择的功能及其对预测的影响。因此,ControlBurn提供了树征模型的准确性和灵活性以及稀疏的广义添加剂模型的解释性。 ControlBurn是可扩展且灵活的:例如,它可以使用温暖启动延续来计算具有数万个样本和数百个功能的数据集的正则化路径(任何数量选定特征的预测误差)。对于较大的数据集,运行时在样本和功能的数量(最多到日志因子)中线性缩放,以及使用草图的包装支持加速。此外,ControlBurn框架可容纳功能成本,功能分组和$ \ ell_0 $的正规机构。该软件包是用户友好且开源的:其文档和源代码显示在https://pypi.org/project/controlburn/和https://github.com/udellgroup/controlburn/。

ControlBurn is a Python package to construct feature-sparse tree ensembles that support nonlinear feature selection and interpretable machine learning. The algorithms in this package first build large tree ensembles that prioritize basis functions with few features and then select a feature-sparse subset of these basis functions using a weighted lasso optimization criterion. The package includes visualizations to analyze the features selected by the ensemble and their impact on predictions. Hence ControlBurn offers the accuracy and flexibility of tree-ensemble models and the interpretability of sparse generalized additive models. ControlBurn is scalable and flexible: for example, it can use warm-start continuation to compute the regularization path (prediction error for any number of selected features) for a dataset with tens of thousands of samples and hundreds of features in seconds. For larger datasets, the runtime scales linearly in the number of samples and features (up to a log factor), and the package support acceleration using sketching. Moreover, the ControlBurn framework accommodates feature costs, feature groupings, and $\ell_0$-based regularizers. The package is user-friendly and open-source: its documentation and source code appear on https://pypi.org/project/ControlBurn/ and https://github.com/udellgroup/controlburn/.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源