人类互动中的祝福：在混杂环境中超级加强学习

论文标题

人类互动中的祝福：在混杂环境中超级加强学习

Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments

论文作者

Wang, Jiayi, Qi, Zhengling, Shi, Chengchun

论文摘要

随着人工智能在整个社会中变得越来越普遍，整合人类和AI系统的有效方法，这些系统利用各自的优势并减轻风险已成为重要的优先事项。在本文中，我们介绍了超强化学习的范式，该学习利用了人类的相互作用来进行数据驱动的顺序决策。这种方法利用观察到的动作，无论是人工智能还是人类，是为决策者（人类或人工智能）实现更强大的政策学习甲骨文的投入。在决策过程中，人们没有衡量的混淆，过去代理人采取的行动可以为未公开的信息提供有价值的见解。通过以新颖而合法的方式将这些信息包括在政策搜索中，提出的超级强化学习将产生超级政策，可以保证，可以胜过标准的最佳政策和行为一（例如，过去的代理人的行动）。我们将这种更强的甲骨文称为人类互动中的祝福。此外，为了解决使用批处理数据寻找超级验证的混杂问题的问题，建立了许多非参数和因果鉴定。在这些新颖的识别结果的基础上，我们开发了几种超政策学习算法，并系统地研究其理论特性，例如有限样本的遗憾保证。最后，我们通过广泛的模拟和现实世界应用来说明提案的有效性。

As AI becomes more prevalent throughout society, effective methods of integrating humans and AI systems that leverage their respective strengths and mitigate risk have become an important priority. In this paper, we introduce the paradigm of super reinforcement learning that takes advantage of Human-AI interaction for data driven sequential decision making. This approach utilizes the observed action, either from AI or humans, as input for achieving a stronger oracle in policy learning for the decision maker (humans or AI). In the decision process with unmeasured confounding, the actions taken by past agents can offer valuable insights into undisclosed information. By including this information for the policy search in a novel and legitimate manner, the proposed super reinforcement learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., past agents' actions). We call this stronger oracle a blessing from human-AI interaction. Furthermore, to address the issue of unmeasured confounding in finding super-policies using the batch data, a number of nonparametric and causal identifications are established. Building upon on these novel identification results, we develop several super-policy learning algorithms and systematically study their theoretical properties such as finite-sample regret guarantee. Finally, we illustrate the effectiveness of our proposal through extensive simulations and real-world applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题