论文标题

公平性和成本约束的隐私感知记录联系

Fairness and Cost Constrained Privacy-Aware Record Linkage

论文作者

Wu, Nan, Vatsalan, Dinusha, Verma, Sunny, Kaafar, Mohamed Ali

论文摘要

记录链接算法匹配不同数据库的链接记录,这些记录是基于直接和/或准标识符(例如名称,地址,年龄和性别)在记录中提供的相同现实世界实体。由于这些标识符通常包含有关实体的个人可识别信息(PII),因此需要使用隐私约束来开发记录链接算法。被称为隐私的记录链接(PPRL),已经进行了许多研究,以对编码和/或加密标识符进行链接。差异隐私(DP)与计算有效的编码方法相结合,例如Bloom过滤器编码已用于开发具有可证明的隐私保证的PPRL。但是,标准的DP概念并未解决其他约束,其中最重要的是公平性偏见和链接成本,以要比较的记录对数。在这项工作中,我们提出了新的公平约束DP的概念以及PPRL的公平性和成本约束的DP,并使用这些新的DP与Bloom Filter编码一起为PPRL开发了一个框架。我们为新的DP概念提供了理论证明公平性和成本约束的PPRL,并在包含特定于人的数据的两个数据集上对其进行了实验评估。我们的实验结果表明,使用这些新的DP概念,PPRL具有更好的性能(与PPRL的标准DP概念相比),就可以在隐私,成本和公平限制方面实现。

Record linkage algorithms match and link records from different databases that refer to the same real-world entity based on direct and/or quasi-identifiers, such as name, address, age, and gender, available in the records. Since these identifiers generally contain personal identifiable information (PII) about the entities, record linkage algorithms need to be developed with privacy constraints. Known as privacy-preserving record linkage (PPRL), many research studies have been conducted to perform the linkage on encoded and/or encrypted identifiers. Differential privacy (DP) combined with computationally efficient encoding methods, e.g. Bloom filter encoding, has been used to develop PPRL with provable privacy guarantees. The standard DP notion does not however address other constraints, among which the most important ones are fairness-bias and cost of linkage in terms of number of record pairs to be compared. In this work, we propose new notions of fairness-constrained DP and fairness and cost-constrained DP for PPRL and develop a framework for PPRL with these new notions of DP combined with Bloom filter encoding. We provide theoretical proofs for the new DP notions for fairness and cost-constrained PPRL and experimentally evaluate them on two datasets containing person-specific data. Our experimental results show that with these new notions of DP, PPRL with better performance (compared to the standard DP notion for PPRL) can be achieved with regard to privacy, cost and fairness constraints.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源