论文标题
数据驱动优化的帕累托优势原则
A Pareto Dominance Principle for Data-Driven Optimization
论文作者
论文摘要
我们提出了一种统计上最佳的方法来构建数据驱动的随机优化问题的决策。从根本上讲,数据驱动的决策只是将可用培训数据映射到可行操作的函数。它始终可以表示为从数据构建的替代优化模型的最小化器。数据驱动决策的质量由其除外风险来衡量。另一个质量衡量标准是其除外的失望,我们将其定义为样本外风险超过替代优化模型的最佳价值的概率。理想的数据驱动决策应同时将样本外风险相对于每个可能的概率措施同时最大程度地减少,因为真实度量是不可行的。不幸的是,这种理想的数据驱动决定通常不可用。这促使我们寻求数据驱动的决策,以最大程度地减少样本中的样本风险,但会受到样本外失望的上限。我们证明,在允许有趣应用程序的条件下存在这种占主导地位的数据驱动的决策:未知数据生成的概率度量必须属于参数歧义集,并且相应的参数必须承认满足大偏差原则的足够统计量。我们可以进一步证明,替代优化模型必须是根据其大偏差原理的足够统计量和速率函数构成的分布强大的优化问题。因此,将数据映射到决策的最佳方法是求解分布强大的优化模型。也许令人惊讶的是,即使培训数据是非I.I.D,此结果也会成立。我们的分析揭示了数据生成的随机过程的结构特性如何影响最佳分布稳健模型的歧义集的形状。
We propose a statistically optimal approach to construct data-driven decisions for stochastic optimization problems. Fundamentally, a data-driven decision is simply a function that maps the available training data to a feasible action. It can always be expressed as the minimizer of a surrogate optimization model constructed from the data. The quality of a data-driven decision is measured by its out-of-sample risk. An additional quality measure is its out-of-sample disappointment, which we define as the probability that the out-of-sample risk exceeds the optimal value of the surrogate optimization model. An ideal data-driven decision should minimize the out-of-sample risk simultaneously with respect to every conceivable probability measure as the true measure is unkown. Unfortunately, such ideal data-driven decisions are generally unavailable. This prompts us to seek data-driven decisions that minimize the in-sample risk subject to an upper bound on the out-of-sample disappointment. We prove that such Pareto-dominant data-driven decisions exist under conditions that allow for interesting applications: the unknown data-generating probability measure must belong to a parametric ambiguity set, and the corresponding parameters must admit a sufficient statistic that satisfies a large deviation principle. We can further prove that the surrogate optimization model must be a distributionally robust optimization problem constructed from the sufficient statistic and the rate function of its large deviation principle. Hence the optimal method for mapping data to decisions is to solve a distributionally robust optimization model. Maybe surprisingly, this result holds even when the training data is non-i.i.d. Our analysis reveals how the structural properties of the data-generating stochastic process impact the shape of the ambiguity set underlying the optimal distributionally robust model.