垂直分区不完整数据的隐私保护方法

论文标题

垂直分区不完整数据的隐私保护方法

Privacy-Preserving Methods for Vertically Partitioned Incomplete Data

论文作者

Deng, Yi, Jiang, Xiaoqian, Long, Qi

论文摘要

近年来，使用来自多种来源信息的分布式健康数据网络引起了重大兴趣。但是，在此类网络中丢失的数据很普遍，并提出了重大的分析挑战。当前用于处理丢失数据的最新方法需要在分析之前将数据汇总到中央存储库中，这在分布式健康数据网络中可能是不可能的。在本文中，我们提出了一个隐私性的分布式分析框架，用于处理丢失的数据时，当数据垂直分区时。在此框架中，每个具有特定数据源的机构都利用本地私人数据来计算必要的中间汇总统计信息，然后将其共享以构建用于处理丢失数据的全局模型。为了评估我们提出的方法，我们进行了模拟研究，这些研究清楚地表明，提出的隐私保护方法以及使用合并数据的方法以及胜过几种幼稚方法的方法。我们通过分析真实数据集进一步说明了提出的方法。与需要汇总数据的方法相比，用于处理垂直分区的不完整数据的拟议框架更具隐私权保护性，因为没有共享个人级别的数据，这可以降低跨多个机构的协作障碍并建立更强大的公众信任。

Distributed health data networks that use information from multiple sources have drawn substantial interest in recent years. However, missing data are prevalent in such networks and present significant analytical challenges. The current state-of-the-art methods for handling missing data require pooling data into a central repository before analysis, which may not be possible in a distributed health data network. In this paper, we propose a privacy-preserving distributed analysis framework for handling missing data when data are vertically partitioned. In this framework, each institution with a particular data source utilizes the local private data to calculate necessary intermediate aggregated statistics, which are then shared to build a global model for handling missing data. To evaluate our proposed methods, we conduct simulation studies that clearly demonstrate that the proposed privacy-preserving methods perform as well as the methods using the pooled data and outperform several naïve methods. We further illustrate the proposed methods through the analysis of a real dataset. The proposed framework for handling vertically partitioned incomplete data is substantially more privacy-preserving than methods that require pooling of the data, since no individual-level data are shared, which can lower hurdles for collaboration across multiple institutions and build stronger public trust.

下载PDF全文

下载文献需遵守相关版权规定

论文标题