论文标题

差异降低了可信赖数据评估的沙普利价值估计

Variance reduced Shapley value estimation for trustworthy data valuation

论文作者

Wu, Mengmeng, Jia, Ruoxi, Lin, Changle, Huang, Wei, Chang, Xiangyu

论文摘要

数据评估,尤其是量化算法预测和决策中的数据值,这是数据交易方案中的一个基本问题。最广泛使用的方法是定义数据莎普利并通过置换采样算法近似它。为了弥补阻碍数据市场开发的置换抽样的较大估计差异,我们建议使用分层采样提出一种更健壮的数据评估方法,称为“差异”降低了数据shapley(简称VRDS)。从理论上讲,我们展示了如何分层,每个层上的样品进行了多少样本以及VRD的样本复杂性分析。最后,在不同类型的数据集和数据删除应用程序中说明了VRD的有效性。

Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of the permutation sampling algorithm. To make up for the large estimation variance of the permutation sampling that hinders the development of the data marketplace, we propose a more robust data valuation method using stratified sampling, named variance reduced data Shapley (VRDS for short). We theoretically show how to stratify, how many samples are taken at each stratum, and the sample complexity analysis of VRDS. Finally, the effectiveness of VRDS is illustrated in different types of datasets and data removal applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源