论文标题
统计选择和MCMC用于差异私人贝叶斯估计
Statistic Selection and MCMC for Differentially Private Bayesian Estimation
论文作者
论文摘要
本文涉及私人贝叶斯对人群分布参数的估计,当时该人群的样本的统计数据以噪声共享以提供差异隐私。 这项工作主要解决了两个问题:(1)应私下共享样本的哪些统计数据?对于第一个问题,即关于统计选择的问题,我们使用Fisher信息来促进。我们发现,在非私人环境中最有用的统计数据可能不是隐私限制下的最佳选择。我们提供了几个示例来支持这一点。我们考虑了几种类型的数据共享设置,并提出了几种基于蒙特卡洛的数值估计方法,用于计算这些设置的Fisher信息。第二个问题涉及推断:(2)基于共享统计数据,我们如何执行有效的贝叶斯推论?我们提出了几种马尔可夫链蒙特卡洛(MCMC)算法,用于从噪声统计量的参数后部分布中进行采样。根据问题,提出的MCMC算法可以相互优选。例如,当共享统计量是加法的和添加的高斯噪声时,使用中央限制定理的简单大都市悬挂算法是一个不错的选择。我们为其他几种实际相关性案例提出了更先进的MCMC算法。 我们的数值示例涉及比较要私下共享的几个候选统计数据。对于每个统计量,我们根据该统计量的私有化版本的后验分布进行贝叶斯估计。我们证明,统计量的相对性能,就基于相应的私有化统计量的贝叶斯估计器的平均误差而言,可以通过私有化统计量的Fisher信息充分预测。
This paper concerns differentially private Bayesian estimation of the parameters of a population distribution, when a statistic of a sample from that population is shared in noise to provide differential privacy. This work mainly addresses two problems: (1) What statistic of the sample should be shared privately? For the first question, i.e., the one about statistic selection, we promote using the Fisher information. We find out that, the statistic that is most informative in a non-privacy setting may not be the optimal choice under the privacy restrictions. We provide several examples to support that point. We consider several types of data sharing settings and propose several Monte Carlo-based numerical estimation methods for calculating the Fisher information for those settings. The second question concerns inference: (2) Based on the shared statistics, how could we perform effective Bayesian inference? We propose several Markov chain Monte Carlo (MCMC) algorithms for sampling from the posterior distribution of the parameter given the noisy statistic. The proposed MCMC algorithms can be preferred over one another depending on the problem. For example, when the shared statistics is additive and added Gaussian noise, a simple Metropolis-Hasting algorithm that utilizes the central limit theorem is a decent choice. We propose more advanced MCMC algorithms for several other cases of practical relevance. Our numerical examples involve comparing several candidate statistics to be shared privately. For each statistic, we perform Bayesian estimation based on the posterior distribution conditional on the privatized version of that statistic. We demonstrate that, the relative performance of a statistic, in terms of the mean squared error of the Bayesian estimator based on the corresponding privatized statistic, is adequately predicted by the Fisher information of the privatized statistic.