论文标题
黑匣子机器学习模型的广义变量重要性指标和估计器
A Generalized Variable Importance Metric and Estimator for Black Box Machine Learning Models
论文作者
论文摘要
在本文中,我们定义了一个总体参数````通用变量重要性度量'',以衡量预测变量对黑匣子机器学习方法的重要性,其中重要性并非基于模型的参数表示。使用真实的条件期望函数为每个输入变量定义了GVIM,并衡量该变量在影响连续或二进制响应方面的重要性。我们扩展了先前发表的结果,以表明所定义的GVIM可以表示为任何类型的预测因子的条件平均治疗效果(CATE)的函数,这使其具有因果解释和进一步的理由,作为仅在简单参数模型中可用的经典意义量度的替代方法。使用协变量与结果之间的现实复杂关系以及不同复杂程度的回归技术数量的大量模拟表明,我们提出的GVIM估计量的性能。
In this paper we define a population parameter, ``Generalized Variable Importance Metric (GVIM)'', to measure importance of predictors for black box machine learning methods, where the importance is not represented by model-based parameter. GVIM is defined for each input variable, using the true conditional expectation function, and it measures the variable's importance in affecting a continuous or a binary response. We extend previously published results to show that the defined GVIM can be represented as a function of the Conditional Average Treatment Effect (CATE) for any kind of a predictor, which gives it a causal interpretation and further justification as an alternative to classical measures of significance that are only available in simple parametric models. Extensive set of simulations using realistically complex relationships between covariates and outcomes and number of regression techniques of varying degree of complexity show the performance of our proposed estimator of the GVIM.