论文标题

最佳代表性样品加权

Optimal Representative Sample Weighting

论文作者

Barratt, Shane, Angeris, Guillermo, Boyd, Stephen

论文摘要

我们考虑将权重分配给一组样本或数据记录的问题,目的是实现代表权重,当数据的某些样本平均值接近规定的值时,就会发生这种情况。我们将找到代表性样本权重作为优化问题的问题构成了问题,在许多情况下,该问题是凸的,并且可以有效地解决。我们的公式包括特殊情况,选择了固定数量的样品,重量相等,即选择样品的较小代表子集的问题。尽管此问题是组合的,而不是凸,但基于凸优化的启发式方法似乎表现很好。我们描述了RSW,这是本文描述的思想的开源实现,并将其应用于CDC BRFSS数据集的偏斜样本。

We consider the problem of assigning weights to a set of samples or data records, with the goal of achieving a representative weighting, which happens when certain sample averages of the data are close to prescribed values. We frame the problem of finding representative sample weights as an optimization problem, which in many cases is convex and can be efficiently solved. Our formulation includes as a special case the selection of a fixed number of the samples, with equal weights, i.e., the problem of selecting a smaller representative subset of the samples. While this problem is combinatorial and not convex, heuristic methods based on convex optimization seem to perform very well. We describe rsw, an open-source implementation of the ideas described in this paper, and apply it to a skewed sample of the CDC BRFSS dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源