可扩展的决策学习在不安定的多军匪徒中，适用于产妇和儿童健康

论文标题

可扩展的决策学习在不安定的多军匪徒中，适用于产妇和儿童健康

Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health

论文作者

Wang, Kai, Verma, Shresth, Mate, Aditya, Shah, Sanket, Taneja, Aparna, Madhiwalla, Neha, Hegde, Aparna, Tambe, Milind

论文摘要

本文研究了不知所措的多臂匪徒（RMAB）问题，该问题具有未知的手臂过渡动力学，但具有已知的相关手臂特征。目的是学习一个模型，以预测给定特征的过渡动态，其中Whittle索引策略使用预测的过渡解决了RMAB问题。但是，先前的工作通常通过最大化预测精度而不是最终的RMAB解决方案质量来学习模型，从而在培训和评估目标之间导致不匹配。为了解决这一缺点，我们提出了一种在RMAB中以决策为中心学习的新方法，该方法直接训练预测模型，以最大程度地提高Whittle索引解决方案质量。我们提出了三个关键贡献：（i）我们建立了Whittle Index政策以支持决策的学习；（ii）我们在顺序问题（特别是RMAB问题）中显着提高了以决策为中心的学习方法的可伸缩性；（iii）我们将算法应用于先前收集的孕产妇和儿童健康数据集，以证明其表现。确实，我们的算法是第一个以RMAB为中心学习的算法，该学习范围扩展到现实世界中的问题大小。

This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions. However, prior works often learn the model by maximizing the predictive accuracy instead of final RMAB solution quality, causing a mismatch between training and evaluation objectives. To address this shortcoming, we propose a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality. We present three key contributions: (i) we establish differentiability of the Whittle index policy to support decision-focused learning; (ii) we significantly improve the scalability of decision-focused learning approaches in sequential problems, specifically RMAB problems; (iii) we apply our algorithm to a previously collected dataset of maternal and child health to demonstrate its performance. Indeed, our algorithm is the first for decision-focused learning in RMAB that scales to real-world problem sizes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题