论文标题
通过强化学习和贝叶斯模型检查,在概率约束下合成安全政策
Synthesizing Safe Policies under Probabilistic Constraints with Reinforcement Learning and Bayesian Model Checking
论文作者
论文摘要
我们建议利用对安全关键领域中强化学习者的约束满意度的认识不确定性。我们介绍了一个在受限设置中的强化学习者要求规范的框架,包括对结果的信心。我们表明,代理商对约束满意度的信心为平衡学习过程中的优化和安全性提供了有用的信号。
We propose to leverage epistemic uncertainty about constraint satisfaction of a reinforcement learner in safety critical domains. We introduce a framework for specification of requirements for reinforcement learners in constrained settings, including confidence about results. We show that an agent's confidence in constraint satisfaction provides a useful signal for balancing optimization and safety in the learning process.