论文标题

高维数据的模型无变量重要性

Model free variable importance for high dimensional data

论文作者

Hama, Naofumi, Mase, Masayoshi, Owen, Art B.

论文摘要

模型 - 敏捷的重要性方法可以与任意预测函数一起使用。在这里,我们提供了一些不需要访问预测功能的无模型方法。当该功能是专有且不可用的,或者极其昂贵时,这很有用。在研究模型的残差时,它也很有用。 Shapley队列(CS)方法不含模型,但在输入空间的尺寸中具有指数成本。 Frye等人的监督了Manifold Shapley方法。 (2020)也是免费模型的,但需要作为输入第二个黑匣子模型,该模型必须接受Shapley Value问题的培训。我们介绍了一个名为igcs的综合梯度(IG)版本,价格为$ \ Mathcal {O}(o}(nd)$。我们表明,在绝大多数相关的单元立方体中,IGCS值函数接近IGC匹配CS的多线性函数。 IgC的另一个好处是,它允许与二进制预测变量一起使用Ig方法。我们使用曲线(ABC)度量之间的某些区域来量化IGC的性能。关于高能量物理学的问题,我们验证IGC的ABC与CS的ABC几乎相同。我们还将其用于1024个变量中计算化学的问题。我们看到在那里,IGC的ABC比从Monte Carlo抽样中获得的ABC要高得多。该代码可在https://github.com/cohortshapley/cohortintgrad上公开获取

A model-agnostic variable importance method can be used with arbitrary prediction functions. Here we present some model-free methods that do not require access to the prediction function. This is useful when that function is proprietary and not available, or just extremely expensive. It is also useful when studying residuals from a model. The cohort Shapley (CS) method is model-free but has exponential cost in the dimension of the input space. A supervised on-manifold Shapley method from Frye et al. (2020) is also model free but requires as input a second black box model that has to be trained for the Shapley value problem. We introduce an integrated gradient (IG) version of cohort Shapley, called IGCS, with cost $\mathcal{O}(nd)$. We show that over the vast majority of the relevant unit cube that the IGCS value function is close to a multilinear function for which IGCS matches CS. Another benefit of IGCS is that is allows IG methods to be used with binary predictors. We use some area between curves (ABC) measures to quantify the performance of IGCS. On a problem from high energy physics we verify that IGCS has nearly the same ABCs as CS does. We also use it on a problem from computational chemistry in 1024 variables. We see there that IGCS attains much higher ABCs than we get from Monte Carlo sampling. The code is publicly available at https://github.com/cohortshapley/cohortintgrad

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源