论文标题
基于虚假发现取得进展
Making Progress Based on False Discoveries
论文作者
论文摘要
自适应数据分析的研究研究了可以使用固定数据集准确地回答多少统计查询,同时避免了错误的发现(统计上不准确的答案)。在本文中,我们解决了研究领域之前的一个问题:仅当数据提供准确的统计查询答案时,数据是否有价值?为了回答这个问题,我们将随机凸优化作为案例研究。 在此模型中,算法被认为是分析师,他们在每次迭代时都查询嘈杂函数梯度的估计并朝着最小化的速度移动。众所周知,$ O(1/ε^2)$示例可以用来最大程度地减少目标函数,但是现有方法都不取决于沿轨迹估计梯度的准确性。因此,我们问:如果我们需要$ O(1/ε^2)$梯度的估计值,则需要多少个样本来最大程度地减少噪声凸功能?或者,是否是以最佳统计速率查找随机凸功能的最小值的不准确梯度估计值\ emph {必要}? 我们为这个问题提供了两个部分答案。首先,我们表明一位总分析师(可能被恶意选择的查询)需要$ω(1/ε^3)$样本,排除了万无一失的机制的可能性。其次,我们表明,在Oracle上的某些假设下,$ \ tildeω(1/ε^{2.5})$样本对于梯度下降是必需的,才能与Oracle交互。我们的结果与表明$ o(1/ε^2)$样本可以优化人口风险的经典范围形成对比,以优化$ O(ε)$的准确性,但具有虚假的梯度。
The study of adaptive data analysis examines how many statistical queries can be answered accurately using a fixed dataset while avoiding false discoveries (statistically inaccurate answers). In this paper, we tackle a question that precedes the field of study: Is data only valuable when it provides accurate answers to statistical queries? To answer this question, we use Stochastic Convex Optimization as a case study. In this model, algorithms are considered as analysts who query an estimate of the gradient of a noisy function at each iteration and move towards its minimizer. It is known that $O(1/ε^2)$ examples can be used to minimize the objective function, but none of the existing methods depend on the accuracy of the estimated gradients along the trajectory. Therefore, we ask: How many samples are needed to minimize a noisy convex function if we require $ε$-accurate estimates of $O(1/ε^2)$ gradients? Or, might it be that inaccurate gradient estimates are \emph{necessary} for finding the minimum of a stochastic convex function at an optimal statistical rate? We provide two partial answers to this question. First, we show that a general analyst (queries that may be maliciously chosen) requires $Ω(1/ε^3)$ samples, ruling out the possibility of a foolproof mechanism. Second, we show that, under certain assumptions on the oracle, $\tilde Ω(1/ε^{2.5})$ samples are necessary for gradient descent to interact with the oracle. Our results are in contrast to classical bounds that show that $O(1/ε^2)$ samples can optimize the population risk to an accuracy of $O(ε)$, but with spurious gradients.