建立一个统一的框架，用于不确定性意识到的非线性变量选择，并提供理论保证

论文标题

建立一个统一的框架，用于不确定性意识到的非线性变量选择，并提供理论保证

Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees

论文作者

Deng, Wenying, Coker, Beau, Mukherjee, Rajarshi, Liu, Jeremiah Zhe, Coull, Brent A.

论文摘要

我们为非线性变量选择开发了一个简单而统一的框架，该框架将不确定性纳入了预测函数，并且与广泛的机器学习模型（例如，树的组合，内核方法，神经网络等）兼容。特别是，对于学习的非线性型号$ f（\ mathbf {x}）$，我们考虑使用集成的部分导数$ψ_j= \ vert = \ vert \ vert \ frac \ frac {\ partial} {\ partial} {\ partial \ mathbf j} f（\ Mathbf {x}）\ vert^2_ {p_ \ mathcal {x}}} $。然后，我们（1）提供了一种通过得出后验分布来量化变量选择不确定性的原则方法，并且（2）表明该方法甚至可以推广到非差异性模型（例如树的合奏）。严格的贝叶斯非参数定理得出以确保所提出方法的后验一致性和渐近不确定性。有关医疗保健基准数据集的大量模拟和实验证实，所提出的算法的表现优于现有的经典和最近的可变选择方法。

We develop a simple and unified framework for nonlinear variable selection that incorporates uncertainty in the prediction function and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods, neural networks, etc). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated partial derivative $Ψ_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_{P_\mathcal{X}}$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulations and experiments on healthcare benchmark datasets confirm that the proposed algorithm outperforms existing classic and recent variable selection methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题