论文标题

GeoPointgan:带有本地标签差异隐私的合成空间数据

GeoPointGAN: Synthetic Spatial Data with Local Label Differential Privacy

论文作者

Cunningham, Teddy, Klemmer, Konstantin, Wen, Hongkai, Ferhatosmanoglu, Hakan

论文摘要

对于许多数据管理和数据科学应用程序而言,合成数据生成是一项基本任务。空间数据特别感兴趣,其敏感性通常会导致隐私问题。我们介绍了Geopointgan,这是一种基于GAN的新颖解决方案,用于生成具有高效用和强大个人级别隐私保证的合成空间点数据集。 Geopointgan的架构包括一个新颖的点转换发生器,该发生器学会将随机生成的点协调物投影到有意义的合成协调中,以捕获显微镜(例如,交界处,正方形)和宏观(例如公园,湖泊,湖泊)地理特征。我们通过标签当地差异隐私提供了我们的隐私保证,这比传统的当地差异隐私更为实用。我们通过将歧视器扩大到点级别并实施基于随机响应的机制,将与“真实”和“假”点相关的标签翻转,将这种隐私级别整合到地理位置Geopointgan中。广泛的实验表明,与最具竞争力的基线相比,Geopointgan的表现明显优于最近的解决方案,最多提高了10倍。我们还使用范围,热点和设施位置查询来评估Geopointgan,这些查询确认了Geopointgan对隐私查询的实际有效性。结果表明,通过几乎没有不利的效用成本实现了强大的隐私水平,我们通过概括和正则化效应来解释,这些效应通过在培训过程中翻转数据标签来实现。

Synthetic data generation is a fundamental task for many data management and data science applications. Spatial data is of particular interest, and its sensitive nature often leads to privacy concerns. We introduce GeoPointGAN, a novel GAN-based solution for generating synthetic spatial point datasets with high utility and strong individual level privacy guarantees. GeoPointGAN's architecture includes a novel point transformation generator that learns to project randomly generated point co-ordinates into meaningful synthetic co-ordinates that capture both microscopic (e.g., junctions, squares) and macroscopic (e.g., parks, lakes) geographic features. We provide our privacy guarantees through label local differential privacy, which is more practical than traditional local differential privacy. We seamlessly integrate this level of privacy into GeoPointGAN by augmenting the discriminator to the point level and implementing a randomized response-based mechanism that flips the labels associated with the 'real' and 'fake' points used in training. Extensive experiments show that GeoPointGAN significantly outperforms recent solutions, improving by up to 10 times compared to the most competitive baseline. We also evaluate GeoPointGAN using range, hotspot, and facility location queries, which confirm the practical effectiveness of GeoPointGAN for privacy-preserving querying. The results illustrate that a strong level of privacy is achieved with little-to-no adverse utility cost, which we explain through the generalization and regularization effects that are realized by flipping the labels of the data during training.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源