论文标题
设计用于数字干预措施的强化学习算法:实施前指南
Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-implementation Guidelines
论文作者
论文摘要
在线加强学习(RL)算法越来越多地用于个性化移动健康和在线教育领域的数字干预措施。在这些设置中设计和测试RL算法的常见挑战包括确保RL算法在实时约束下可以稳定学习和运行,并考虑到环境的复杂性,例如,缺乏用于用户动力学的准确机械模型。为了指导人们如何应对这些挑战,我们将PC(可预测性,可计算性,稳定性)框架扩展到了一个数据科学框架,该框架结合了监督学习中的机器学习和统计数据(Yu and Kumbier,2020年),以设计数字干预设置的RL算法。此外,我们提供了有关如何设计仿真环境的准则,这是使用PCS框架评估RL候选算法的关键工具。我们说明了使用PCS框架来设计用于Oralytics的RL算法的使用,这是一项移动健康研究,旨在通过个性化的干预消息来改善用户的牙刷行为。 Oralytics将于2022年底进入该领域。
Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. To guide how one can tackle these challenges, we extend the PCS (Predictability, Computability, Stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning (Yu and Kumbier, 2020), to the design of RL algorithms for the digital interventions setting. Further, we provide guidelines on how to design simulation environments, a crucial tool for evaluating RL candidate algorithms using the PCS framework. We illustrate the use of the PCS framework for designing an RL algorithm for Oralytics, a mobile health study aiming to improve users' tooth-brushing behaviors through the personalized delivery of intervention messages. Oralytics will go into the field in late 2022.