论文标题
测试拟合优度和有条件独立性,并具有近似的共同采样
Testing goodness-of-fit and conditional independence with approximate co-sufficient sampling
论文作者
论文摘要
合适的优点(GOF)测试在统计数据中无处不在,与模型选择,置信区间构建,有条件的独立性测试和多次测试的直接联系,仅举几个应用程序。在测试简单(点)零假设的GOF时,在选择测试统计量的同时还可以确保有效性,为分析师提供了极大的灵活性,但由于测试统计量必须在整个无效模型空间中具有可拖动的分布,因此大多数复合null假设的GOF测试都受到了更大的约束。一个值得注意的例外是共平采样(CSS):在无效模型的足够统计量上重新采样数据,可保证使用分析师选择的任何测试统计量来保证有效的GOF测试。但是CSS测试要求无效模型具有足够的统计量(在信息理论意义上),这仅适用于非常有限的模型;即使对于像逻辑回归一样简单的空模型,CSS测试也是无能为力的。在本文中,我们利用近似足够的概念将CSS测试推广到具有渐近估计器的任何参数模型。我们称我们的扩展名为“近似CSS”(ACSS)测试。我们量化了ACSS测试的有限样本I型误差通货膨胀,并表明它在标准最大似然渐近学下正在消失,以进行任何选择的测试统计量。我们在理论上还是在模拟中应用了我们提出的程序,以证明其有限样本的I型错误和功率。
Goodness-of-fit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing, just to name a few applications. While testing the GoF of a simple (point) null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for composite null hypotheses are far more constrained, as the test statistic must have a tractable distribution over the entire null model space. A notable exception is co-sufficient sampling (CSS): resampling the data conditional on a sufficient statistic for the null model guarantees valid GoF testing using any test statistic the analyst chooses. But CSS testing requires the null model to have a compact (in an information-theoretic sense) sufficient statistic, which only holds for a very limited class of models; even for a null model as simple as logistic regression, CSS testing is powerless. In this paper, we leverage the concept of approximate sufficiency to generalize CSS testing to essentially any parametric model with an asymptotically-efficient estimator; we call our extension "approximate CSS" (aCSS) testing. We quantify the finite-sample Type I error inflation of aCSS testing and show that it is vanishing under standard maximum likelihood asymptotics, for any choice of test statistic. We apply our proposed procedure both theoretically and in simulation to a number of models of interest to demonstrate its finite-sample Type I error and power.