在未知重叠之下结合概率和非概率样本的方法

论文标题

在未知重叠之下结合概率和非概率样本的方法

Methods for Combining Probability and Nonprobability Samples Under Unknown Overlaps

论文作者

Savitsky, Terrance D., Williams, Matthew R., Gershunskaya, Julie, Beresovsky, Vladislav, Johnson, Nels G.

论文摘要

越来越多地寻求非概率（便利性）样本，以减少一种或多种感兴趣的人群变量的估计差异，该变量使用随机调查（参考）样本来估计，通过增加有效的样本量。从便利性样本中得出的人口数量的估计通常会导致偏见，因为方便样本中感兴趣的变量的分布与人口分布不同。最近的一组方法通过指定参考样品加权伪可能性来估计方便样本单元的包含概率。本文介绍了一种新的方法，该方法将观察到的样本的倾向得分获得了参考和便利样本的包含概率的函数，作为我们的主要结果。我们的方法允许直接针对观察到的样本指定可能性，而不是近似或伪可能性。我们构建了贝叶斯分层公式，同时估计样品倾向得分和便利性样本纳入概率。我们使用蒙特卡洛模拟研究将基于可能性的结果与文献中考虑的基于伪可能的方法进行比较。

Nonprobability (convenience) samples are increasingly sought to reduce the estimation variance for one or more population variables of interest that are estimated using a randomized survey (reference) sample by increasing the effective sample size. Estimation of a population quantity derived from a convenience sample will typically result in bias since the distribution of variables of interest in the convenience sample is different from the population distribution. A recent set of approaches estimates inclusion probabilities for convenience sample units by specifying reference sample-weighted pseudo likelihoods. This paper introduces a novel approach that derives the propensity score for the observed sample as a function of inclusion probabilities for the reference and convenience samples as our main result. Our approach allows specification of a likelihood directly for the observed sample as opposed to the approximate or pseudo likelihood. We construct a Bayesian hierarchical formulation that simultaneously estimates sample propensity scores and the convenience sample inclusion probabilities. We use a Monte Carlo simulation study to compare our likelihood based results with the pseudo likelihood based approaches considered in the literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题