论文标题
$ p $概括回归和可扩展的最大似然估计,通过素描和核心
$p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets
论文作者
论文摘要
我们研究了$ p $的概率回归模型,这是用于二元响应的广义线性模型。它通过将其链接函数(标准正常CDF)替换为$ p $ generalized的正态分布,用于[1,\ infty)$,从而扩展了标准概率模型。 $ p $概括的普通分布\ citep {sub23}在统计建模中特别感兴趣,因为它们更适合数据。可以通过选择参数$ p $来控制他们的尾巴行为,从而影响模型对异常值的敏感性。特殊情况包括拉普拉斯,高斯和统一分布。我们进一步展示了如何通过将草图技术与重要性亚采样相结合以获得一个称为Corceset的小数据摘要,如何有效地将$ p $概括的概率回归的最大似然估计器有效地近似为$(1+ \ varepsilon)$。
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. It extends the standard probit model by replacing its link function, the standard normal cdf, by a $p$-generalized normal distribution for $p\in[1, \infty)$. The $p$-generalized normal distributions \citep{Sub23} are of special interest in statistical modeling because they fit much more flexibly to data. Their tail behavior can be controlled by choice of the parameter $p$, which influences the model's sensitivity to outliers. Special cases include the Laplace, the Gaussian, and the uniform distributions. We further show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+\varepsilon)$ on large data by combining sketching techniques with importance subsampling to obtain a small data summary called coreset.