论文标题
使用跨U统计量的维度不合时宜
Dimension-agnostic inference using cross U-statistics
论文作者
论文摘要
用于统计推断的经典渐近理论通常涉及通过修复尺寸$ d $来校准统计量,同时让样本量$ n $增加到无限。最近,已经为了解这些方法在高维环境中的行为如何,在高维环境中,$ d $和$ n $都会增加无限。这通常会导致不同的推理过程,具体取决于对维度的假设,使从业者处于束缚中:给定一个在20维中的数据集,是否应该通过假设$ n \ gg d $或$ d/n \ $ d/n \约0.2 $来校准?本文考虑了维度不足的推断的目标。开发有效性的方法不取决于$ d $而不是$ n $的任何假设。我们介绍了一种使用现有测试统计数据的变异表示以及样本分裂和自称为单位化的方法,以产生具有高斯限制分布的精制测试统计量,而不管$ n $的$ d $ scale均如何。所得统计量可以看作是仔细修改退化的U统计量,掉落对角线块并保留异对决的块。我们为某些经典问题(包括一样本平均值和协方差测试)的技术举例说明了我们的技术,并证明我们的测试对适当的本地替代方案具有最小的速率优势。在大多数设置中,我们的跨U统计量匹配相应(退化)u统计的高维功率,直至$ \ sqrt {2} $ factor。
Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.