论文标题
MLE的不确定性定量与协变量的实体排名
Uncertainty Quantification of MLE for Entity Ranking with Covariates
论文作者
论文摘要
本文涉及基于成对比较与其他协变量信息的统计估计和对排名问题的推断,例如比较项目的属性。尽管进行了广泛的研究,但很少有文献在存在协变量信息的更现实的环境下调查了这个问题。为了解决这个问题,我们提出了一个新型模型,协变量的排名估计(CARE)模型,该模型通过合并协变量信息,扩展了众所周知的Bradley-Terry-luce(BTL)模型。具体而言,我们假设基础分数由$ \ {α_I^*+{x} _i^} _i^_i^\topβ^*^*^*_ = 1} $,而不是假设每个比较的项目具有固定的潜在分数$ \ {θ_i^*\} _ {i = 1}^n $ $ {x} _i^\topβ*$分别代表$ i $ -th项目的潜在基线和协变量得分。我们强加自然的可识别性条件,并得出$ \ ell _ {\ infty} $ - 和$ \ ell_2 $ - optimal速率的最大可能性估计量为$ \ {α_i^*\} _ {i = 1}^{n}^{n} $和$β^*$在稀疏的比较下,使用了一个小说'',使用了一个小说'' 2019)。为了进行统计推断,我们进一步得出了$ \ {α_i^*\} _ {i = 1}^n $和$β^*$的MLE的渐近分布,具有最小的样本复杂性。这使我们能够回答一个问题,某些协变量是否具有潜在分数的任何解释能力,并且可以阈值一些稀疏参数以提高排名绩效。我们改进了BLT模型(Gao等,2021)中使用的近似方法,并将其推广到护理模型。此外,我们通过大规模的数值研究验证了我们的理论结果,并应用于持有数据集的共同基金库存。
This paper concerns with statistical estimation and inference for the ranking problems based on pairwise comparisons with additional covariate information such as the attributes of the compared items. Despite extensive studies, few prior literatures investigate this problem under the more realistic setting where covariate information exists. To tackle this issue, we propose a novel model, Covariate-Assisted Ranking Estimation (CARE) model, that extends the well-known Bradley-Terry-Luce (BTL) model, by incorporating the covariate information. Specifically, instead of assuming every compared item has a fixed latent score $\{θ_i^*\}_{i=1}^n$, we assume the underlying scores are given by $\{α_i^*+{x}_i^\topβ^*\}_{i=1}^n$, where $α_i^*$ and ${x}_i^\topβ^*$ represent latent baseline and covariate score of the $i$-th item, respectively. We impose natural identifiability conditions and derive the $\ell_{\infty}$- and $\ell_2$-optimal rates for the maximum likelihood estimator of $\{α_i^*\}_{i=1}^{n}$ and $β^*$ under a sparse comparison graph, using a novel `leave-one-out' technique (Chen et al., 2019) . To conduct statistical inferences, we further derive asymptotic distributions for the MLE of $\{α_i^*\}_{i=1}^n$ and $β^*$ with minimal sample complexity. This allows us to answer the question whether some covariates have any explanation power for latent scores and to threshold some sparse parameters to improve the ranking performance. We improve the approximation method used in (Gao et al., 2021) for the BLT model and generalize it to the CARE model. Moreover, we validate our theoretical results through large-scale numerical studies and an application to the mutual fund stock holding dataset.