论文标题
pasterrisk:快速准确的可解释风险评分
FasterRisk: Fast and Accurate Interpretable Risk Scores
论文作者
论文摘要
在上个世纪,风险评分一直是医疗保健和刑事司法中使用的最流行的预测模型形式。风险评分是具有整数系数的稀疏线性模型。通常,这些模型可以记住或放在索引卡上。通常,风险分数是在没有数据或舍入逻辑回归系数的情况下创建的,但是这些方法并不能可靠地产生高质量的风险评分。最近的工作使用了数学编程,这在计算上很慢。我们引入了一种方法,以有效地产生从数据中学到的高质量风险分数集合。具体而言,我们的方法使用横梁搜索算法产生了几乎最佳的稀疏连续解决方案,每个解决方案都具有不同的支持集。这些连续的解决方案中的每一个都通过“星光”搜索转化为单独的风险评分,在依次将系数四舍五入之前,考虑了一系列乘数,以保持低逻辑损失。我们的算法返回所有这些高质量的风险分数供用户考虑。此方法在几分钟内完成,并且在各种应用程序中都很有价值。
Over the last century, risk scores have been the most popular form of predictive model used in healthcare and criminal justice. Risk scores are sparse linear models with integer coefficients; often these models can be memorized or placed on an index card. Typically, risk scores have been created either without data or by rounding logistic regression coefficients, but these methods do not reliably produce high-quality risk scores. Recent work used mathematical programming, which is computationally slow. We introduce an approach for efficiently producing a collection of high-quality risk scores learned from data. Specifically, our approach produces a pool of almost-optimal sparse continuous solutions, each with a different support set, using a beam-search algorithm. Each of these continuous solutions is transformed into a separate risk score through a "star ray" search, where a range of multipliers are considered before rounding the coefficients sequentially to maintain low logistic loss. Our algorithm returns all of these high-quality risk scores for the user to consider. This method completes within minutes and can be valuable in a broad variety of applications.