统计匿名：量化重新识别风险而无需重新识别用户

论文标题

统计匿名：量化重新识别风险而无需重新识别用户

Statistical anonymity: Quantifying reidentification risks without reidentifying users

论文作者

Bravo-Hermsdorff, Gecia, Busa-Fekete, Robert, Gunderson, Lee M., Medina, Andrés Munõz, Syed, Umar

论文摘要

数据匿名是一种旨在防止参与者重新识别的隐私数据发布的方法，它是无法忍受嘈杂数据的应用程序中差异隐私的重要替代方法。在发布数据中实施$ K $匿名性的现有算法假定，执行匿名的策展人可以完全访问原始数据。限制此访问范围的原因从不良性到完全不可行。本文探讨了想法 - 目标，指标，协议和扩展 - 用于减少必须放置在策展人中的信任，同时仍保持$ k $ nonyminity的统计概念。我们建议信任（提供给策展人的信息数量）和隐私（参与者的匿名性）作为此类框架的主要目标。我们描述了一类旨在实现这些目标，在此过程中提出新的隐私指标并证明相关界限的协议。最后，我们讨论了这项工作的自然扩展，该工作完全消除了中央策展人的需求。

Data anonymization is an approach to privacy-preserving data release aimed at preventing participants reidentification, and it is an important alternative to differential privacy in applications that cannot tolerate noisy data. Existing algorithms for enforcing $k$-anonymity in the released data assume that the curator performing the anonymization has complete access to the original data. Reasons for limiting this access range from undesirability to complete infeasibility. This paper explores ideas -- objectives, metrics, protocols, and extensions -- for reducing the trust that must be placed in the curator, while still maintaining a statistical notion of $k$-anonymity. We suggest trust (amount of information provided to the curator) and privacy (anonymity of the participants) as the primary objectives of such a framework. We describe a class of protocols aimed at achieving these goals, proposing new metrics of privacy in the process, and proving related bounds. We conclude by discussing a natural extension of this work that completely removes the need for a central curator.

下载PDF全文

下载文献需遵守相关版权规定

论文标题