论文标题

CS-Shapley:分类中的数据评估的班级沙普利值

CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification

论文作者

Schoch, Stephanie, Xu, Haifeng, Ji, Yangfeng

论文摘要

数据评估或单个基准贡献的估值,由于其对噪声标签检测等任务的可证明功效,人们对机器学习的兴趣越来越大。特别是,由于理想的公理特性,已经提出了几种沙普利值近似方法。在这些方法中,值函数通常定义为整个开发集中的预测精度。但是,这限制了区分对自己的课程有用或有害的培训实例的能力。直觉上,损害自己的班级的实例可能是嘈杂的或错误的标签,并且应该比有用的实例获得较低的估值。在这项工作中,我们提出了CS-Shapley,这是一种具有新价值函数的Shapley价值,可以区分培训实例的课外和课外贡献。我们的理论分析表明,所提出的值函数(本质上)是唯一函数,该函数满足了评估分类中数据值的两个理想属性。此外,我们对两项基准评估任务(数据删除和嘈杂标签检测)的实验和四个分类器证明了CS-Shapley对现有方法的有效性。最后,我们评估了从一个分类器估计的数据值的“可传递性”,我们的结果表明,基于Shapley的数据估值是可以在不同模型跨不同模型的应用中传输的。

Data valuation, or the valuation of individual datum contributions, has seen growing interest in machine learning due to its demonstrable efficacy for tasks such as noisy label detection. In particular, due to the desirable axiomatic properties, several Shapley value approximation methods have been proposed. In these methods, the value function is typically defined as the predictive accuracy over the entire development set. However, this limits the ability to differentiate between training instances that are helpful or harmful to their own classes. Intuitively, instances that harm their own classes may be noisy or mislabeled and should receive a lower valuation than helpful instances. In this work, we propose CS-Shapley, a Shapley value with a new value function that discriminates between training instances' in-class and out-of-class contributions. Our theoretical analysis shows the proposed value function is (essentially) the unique function that satisfies two desirable properties for evaluating data values in classification. Further, our experiments on two benchmark evaluation tasks (data removal and noisy label detection) and four classifiers demonstrate the effectiveness of CS-Shapley over existing methods. Lastly, we evaluate the "transferability" of data values estimated from one classifier to others, and our results suggest Shapley-based data valuation is transferable for application across different models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源